Welcome to the
Daily Dose of Data Science

For data professionals and enthusiasts who want to build a career on core expertise, not fleeting trends.

Every week, we share practical no-fluff deep dives on topics that truly matter to your skills for succeeding and staying relevant in ML & DS roles

So when you apply for ML/DS roles, you don’t get dismissed as just another generic applicant; when you’re interviewed, your responses don't reveal gaps in core ML expertise; and when you take on real-world projects, you don't feel clueless about how to approach problems.

I want to join

Read 150+ testimonials here →

Plans & Pricing

Monthly

$12

$6 /month

1-2 new articles every week.
Access all previous articles.
Personal chat support.

Subscribe now (Get 50% Off)

Yearly

$120

$60 /year cheaper

1-2 new articles every week.
Access all previous articles.
Personal chat support.
10 extra articles over monthly.
2 months extra over monthly.
10 day refund.

Subscribe now (Get 50% Off)

Lifetime

$360

$180

1-2 new articles every week.
Access all previous articles.
Personal chat support.
Pay one-time.
No renewals.
Lifetime access.
10 day refund.

Subscribe now (Get 50% Off)

EXPLORE OUR LIBRARY OF DEEP DIVES

Joining will instantly unlock all 50+ practical deep dives we have published so far. The following previews will give you a gist of the practical topics we typically cover and how they will help you progress in your career.

A Crash Course on Model Calibration

37 min read

Modern neural networks being trained today are highly misleading.

They appear to be heavily overconfident in their predictions.

For instance, if a model predicts an event with a 70% probability, then ideally, out of 100 such predictions, approximately 70 should result in the event occurring.

However, many experiments have revealed that modern neural networks appear to be losing this ability.

For instance, consider the following image, which compares two models:

The above image indicates that even though both models are equally accurate:

Model A produces an average confidence that aligns with its accuracy.
However, Model B thinks it's 99% confident in its predictions, but in reality, it only turns out to be 88% accurate.

Calibration solves this.

A model is calibrated if the predicted probabilities align with the actual outcomes.

Handling this is important because the model will be used in decision-making.

In fact, an overly confident but not equally accurate model can be fatal.

To exemplify, say a government hospital wants to conduct an expensive medical test on patients.

To ensure that the govt. funding is used optimally, a reliable probability estimate can help the doctors make this decision.

If the model isn't calibrated, it will produce overly confident predictions.

This two-part crash covers model calibration in a beginner-friendly manner with implementations

Model calibration Part 1

Model calibration Part 2

In this two-part crash course, we:

dive into the details of model calibration
understand why it is a problem
discuss why modern models are miscalibrating more
learn techniques to determine miscalibration and their limitations.
learn techniques to address miscalibration, with implementations.
and more.

There has been a rising concern in the industry about ensuring that our machine learning models communicate their confidence effectively. Thus, being able to detect miscalibration and fix is a super skill one can possess.

Assuming you aspire to make valuable contributions to your data science job, this series will be super helpful in cultivating a diversified skill set.

A Crash Course on Graph Neural Networks (With Implementation)

56 min read

Google Maps uses graph ML for ETA prediction.
Pinterest uses graph ML (PingSage) for recommendations.
Netflix uses graph ML (SemanticGNN) for recommendations.
Spotify uses graph ML (HGNNs) for audiobook recommendations.
Uber Eats uses graph ML (a GraphSAGE variant) to suggest dishes, restaurants, etc.

The list could go on since almost every major tech company I know employs graph ML in some capacity.

Becoming proficient in graph machine learning now seems to be far more critical than traditional deep learning to differentiate your profile and aim for these positions.

The reason why there has been a growing interest in graph ML is simple.

Traditional deep learning typically relies on data formats that are tabular, image-based, or sequential (like language) in nature.

However, with time, we have also realized the inherent challenges of such traditional approaches. One such challenge is their inability to naturally model complex relationships and dependencies between entities that are not easily captured by fixed grids or sequences.

More specifically, a significant proportion of our real-world data often exists (or can be represented) as graphs:

Entities (nodes) are connected by relationships (edges).
Connections carry significant meaning, which, if we knew how to model, can lead to much more robust models.

The field of Graph Neural Networks (GNNs) intends to fill this gap by extending deep learning techniques to graph data.

As a result, they have been emerging as a technique to learn smartly from data.

This three-part guide covers graph neural networks in a beginner-friendly manner:

Graph Neural Network Part 1

Graph Neural Network Part 2

Graph Neural Network Part 3

We cover:

Background of GNNs and their benefits.
Type of tasks for GNNs.
Data challenges in GNNs.
Frameworks to build GNNs.
Advanced architectures to build robust GNNs.
Feature engineering methods.
A practical demo.
Insights and some best practices.

Optimize ML Models to Run Them on Tiny Hardware using Quantization

23 min read

Typically, the parameters of a neural network (layer weights) are represented using 32-bit floating-point numbers.

The rationale is that since the parameters of an ML model are not constrained to any specific range of values, assigning a data type to parameters that cover a wide range of values is wise to avoid numerical instability and maintain high precision during training and inference.

Quite clearly, a major caveat of this approach is that using the biggest data type also means the model will consume more memory.

Imagine if we could represent the same parameters using lower-bit representations, such as 16-bit, 8-bit, 4-bit, or even 1-bit while preserving or maybe retaining most of the information.

Wouldn’t that be cool?

This would significantly decrease the memory required to store the model’s parameters without substantially compromising the model’s accuracy.

Quantization techniques precisely help us do that.

As a machine learning engineer, awareness about techniques that help your employer save money is genuinely well appreciated.

Skillsets like these put you on par to become an indispensable asset to your team.

In this beginner-friendly deep dive, we cover the following:

Motivation behind Quantization
How it differs from similar techniques like mixed precision training
Common quantization techniques for semi-large models.
Issues with these techniques for large models.
Some methods for quantizing large models.
And more.

Model Quantization

Model Compression: A Step Towards Efficient Machine Learning

31 min read

Model accuracy alone (or an equivalent performance metric) rarely determines which model will be deployed.

Much of the engineering effort goes into making the model production-friendly.

Because typically, the model that gets shipped is NEVER solely determined by performance — a misconception that many have.

Instead, we also consider several operational and feasibility metrics, such as:

Inference Latency: Time taken by the model to return a prediction.
Model size: The memory occupied by the model.
Ease of scalability, etc.

For instance, consider the image below. It compares the accuracy and size of a large neural network we developed in the article with its pruned (or reduced/compressed) versions:

Looking at these results, don’t you strongly prefer deploying the model that is 72% smaller, but is still (almost) as accurate as the large model?

Of course, this depends on the task but in most cases, it might not make any sense to deploy the large model when one of its largely pruned versions performs equally well.

We discussed and implemented 4 model compression techniques in the article below, which ML teams regularly use to save 1000s of dollars in running ML models in production.

Model Compression Techniques

Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.

26 min read

There’s so much data on your mobile phone right now — images, text messages, etc.

And this is just about one user — you.

But applications can have millions of users. The amount of data we can train ML models on is unfathomable.

The problem?

This data is private.

So you cannot consolidate this data into a single place to train a model.

The solution?

Federated learning is a smart way to address this challenge.

The core idea is to ship models to devices, train the model on the device, and retrieve the updates:

Lately, more and more users have started caring about their privacy.

Thus, more and more ML teams are resorting to federated learning to build ML models, while still preserving user privacy.

Of course, there are many challenges to federated learning:

How do we decide whether federated learning is actually suitable for us?
As the model is trained on the client side, how to reduce its size?
How do we aggregate different models received from the client side?
[IMPORTANT] Privacy-sensitive datasets are always biased with personal likings and beliefs. For instance, in an image-related task:
- Some clients may only have pet images.
- Some clients may only have car images.
- Some clients may love to travel, so most images they have are travel-related.
- How do we handle such skewness in client data distribution?
What are the considerations for federated learning?
Lastly, how do we implement federated learning models?

We cover everything in this deep dive on federated learning (entirely beginner-friendly):

Federated Learning

A Beginner-friendly Guide to Multi-GPU Training

15 min read

If you look at job descriptions for Applied ML or ML engineer roles on LinkedIn, most of them demand skills like the ability to train models on large datasets:

Of course, this is not something new or emerging.

But the reason they explicitly mention “large datasets” is quite simple to understand.

Businesses have more data than ever before.

Traditional single-node model training just doesn’t work because one cannot wait months to train a model.

Distributed (or multi-GPU) training is one of the most essential ways to address this.

In this deep dive, we cover some core technicalities behind multi-GPU training, how it works under the hood, and implementation details.

We also look at the key considerations for multi-GPU (or distributed) training, which, if not addressed appropriately, may lead to suboptimal performance or slow training.

Multi-GPU Training

5 Must-Know Ways to Test ML Models in Production (Implementation Included)

17 min read

Despite rigorously testing an ML model locally (on validation and test sets), it could be a terrible idea to instantly replace the previous model with the new model.

A more reliable strategy is to test the model in production (yes, on real-world incoming data).

While this might sound risky, ML teams do it all the time, and it isn’t that complicated.

There are many ways to do this.

We discussed five must-know strategies; how they work, when to use them, advantages and considerations, and their implementations in this deep dive:

Testing in Production

The article is entirely beginner-friendly, so even if you have not deployed any model before, you should be good to go.

Bayesian Optimization for Hyperparameter Tuning

20 min read

There are many issues with grid search and random search.

They are computationally expensive due to exhaustive search.
The search is restricted to the specified hyperparameter range. But what if the ideal hyperparameter exists outside that range?

They can ONLY perform discrete searches, even if the hyperparameter is continuous.

Bayesian optimization solves this.

It uses Bayesian statistics to estimate the distribution of the best hyperparameters.

Both grid search and random Search evaluate every hyperparameter configuration independently. Thus, they iteratively explore all hyperparameter configurations to find the most optimal one.

However, Bayesian Optimization takes informed steps based on the results of the previous hyperparameter configurations.

This lets it confidently discard non-optimal configurations. Consequently, the model converges to an optimal set of hyperparameters much faster.

The efficacy of Bayesian Optimization is evident from the image below.

Bayesian optimization leads the model to the same F1 score but:

it takes 7x fewer iterations
it executes 5x faster
it reaches the optimal configuration earlier

The idea behind Bayesian optimization appeared to be extremely compelling to me when I first learned it a few years back.

Learning about this optimized hyperparameter tuning and utilizing it has been extremely helpful to me in building large ML models quickly.

Thus, learning about Bayesian optimization will be immensely valuable if you envision doing the same.

Assuming you have never had any experience with Bayesian optimation before, the article covers:

Issues with traditional hyperparameter tuning approaches.
What is the motivation for Bayesian optimization?
How does Bayesian optimization work?
The intuition behind Bayesian optimization.
Results from the research paper that proposed Bayesian optimization for hyperparameter tuning.
A hands-on Bayesian optimization experiment.
Comparing Bayesian optimization with grid search and random search.
Analyzing the results of Bayesian optimization.
Best practices for using Bayesian optimization.

Learning about optimized hyperparameter tuning and utilizing it will be extremely helpful to you if you wish to build large ML models quickly.

Bayesian Optimization Article

I am ready to join

Data scientists and ML engineers love the Daily Dose of Data Science

I am ready to join

Every week, we share practical no-fluff deep dives on topics that truly matter to your skills for succeeding and staying relevant in ML & DS roles

Plans & Pricing

A Crash Course on Model Calibration

A Crash Course on Graph Neural Networks (With Implementation)

Optimize ML Models to Run Them on Tiny Hardware using Quantization

Model Compression: A Step Towards Efficient Machine Learning

Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.

A Beginner-friendly Guide to Multi-GPU Training

5 Must-Know Ways to Test ML Models in Production (Implementation Included)

Bayesian Optimization for Hyperparameter Tuning

Data scientists and ML engineers love the Daily Dose of Data Science

Join the Daily Dose of Data Science Today!