

TODAY'S ISSUE
TODAY’S DAILY DOSE OF DATA SCIENCE
Transfer Learning, Fine-tuning, Multitask Learning and Federated Learning
Most ML models are trained independently without any interaction with other models.
However, in the realm of real-world ML, there are many powerful learning techniques that rely on model interactions to improve performance.
The following animation neatly summarizes four such well-adopted and must-know training methodologies:

Let’s discuss them today.
#1) Transfer learning

This is extremely useful when:
- The task of interest has less data.
- But a related task has abundant data.
This is how it works:
- Train a neural network model (base model) on the related task.
- Replace the last few layers on the base model with new layers.
- Train the network on the task of interest, but don’t update the weights of the unreplaced layers during backpropagation.
By training a model on the related task first, we can capture the core patterns of the task of interest.
Later, we can adjust the last few layers to capture task-specific behavior.
Another idea which is somewhat along these lines is knowledge distillation, which involves the “transfer” of knowledge. We discussed it here if you are interested in learning about it.
Transfer learning is commonly used in many computer vision tasks.
#2) Fine-tuning

Fine-tuning involves updating the weights of some or all layers of the pre-trained model to adapt it to the new task.
The idea may appear similar to transfer learning, but in fine-tuning, we typically do not replace the last few layers of the pre-trained network.
Instead, the pretrained model itself is adjusted to the new data.
#3) Multi-task learning

As the name suggests, a model is trained to perform multiple tasks simultaneously.
The model shares knowledge across tasks, aiming to improve generalization and performance on each task.
It can help in scenarios where tasks are related, or they can benefit from shared representations.
In fact, the motive for multi-task learning is not just to improve generalization.
We can also save compute power during training by having a shared layer and task-specific segments.

- Imagine training two models independently on related tasks.
- Now compare it to having a network with shared layers and then task-specific branches.
Option 2 will typically result in:
- Better generalization across all tasks.
- Less memory utilization to store model weights.
- Less resource utilization during training.
This and this are two of the best survey papers I have ever read on multi-task learning.
#4) Federated learning

This is another pretty cool technique for training ML models.
Simply put, federated learning is a decentralized approach to machine learning. Here, the training data remains on the devices (e.g., smartphones) of users.
Instead of sending data to a central server, models are sent to devices, trained locally, and only model updates are gathered and sent back to the server.

It is particularly useful to enhance privacy and security. What’s more, it also reduces the need for centralized data collection.
The keyboard of our smartphone is a great example of this.
Federated learning allows our smartphone’s keyboard to learn and adapt to our typing habits. This happens without transmitting sensitive keystrokes or personal data to a central server.
The model, which predicts our next word or suggests auto-corrections, is sent to our device, and the device itself fine-tunes the model based on our input.
Over time, the model becomes personalized to our typing style while preserving our data privacy and security.
Do note that as the model is trained on small devices, it also means that these models must be extremely lightweight yet powerful enough to be useful.
We covered Federated learning in detail here: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.
👉 Over to you: What are some other ML training methodologies that I have missed here?
Model optimization
​Model compression to optimize models for production​​​
Model accuracy alone (or an equivalent performance metric) rarely determines which model will be deployed.
Much of the engineering effort goes into making the model production-friendly.
Because typically, the model that gets shipped is NEVER solely determined by performance — a misconception that many have.

Instead, we also consider several operational and feasibility metrics, such as:
- Inference Latency: Time taken by the model to return a prediction.
- Model size: The memory occupied by the model.
- Ease of scalability, etc.
For instance, consider the image below. It compares the accuracy and size of a large neural network I developed to its pruned (or reduced/compressed) version:
Looking at these results, don’t you strongly prefer deploying the model that is 72% smaller, but is still (almost) as accurate as the large model?
Of course, this depends on the task but in most cases, it might not make any sense to deploy the large model when one of its largely pruned versions performs equally well.
We discussed and implemented 6 model compression techniques in the article ​here​, which ML teams regularly use to save 1000s of dollars in running ML models in production.
​Learn how to compress models before deployment with implementation →​
Build reliable LLM apps
​Trace and Monitor Any AI/LLM App​
If you are building with LLMs, you absolutely need traceability.
Opik is an open-source, production-ready end-to-end LLM evaluation platform.
It allows developers to test their LLM applications in development, before a release (CI/CD), and in production.
Here’s an example with CrewAI below:

All you need to do is this:

- Put your LLM logic inside a function.
- Add the
@track
decorator.
Done!
After this, Opik will track everything within your AI application, from LLM calls (with cost) to evaluation metrics and intermediate logs.
If you want to dive further, we also published a practical guide on Opik to help you integrate evaluation and observability into your LLM apps (with implementation).
It has open access to all readers.
Start here: ​A Practical Guide to Integrate Evaluation and Observability into LLM Apps.​
THAT'S A WRAP
No-Fluff Industry ML resources to
Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!
- Can you reduce costs?
- Drive revenue?
- Can you scale ML models?
- Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
- Learn sophisticated graph architectures and how to train them on graph data in this crash course.
- So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
- Run large models on small devices using Quantization techniques.
- Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
- Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
- Learn how to scale and implement ML model training in this practical guide.
- Learn 5 techniques with implementation to reliably test ML models in production.
- Learn how to build and implement privacy-first ML systems using Federated Learning.
- Learn 6 techniques with implementation to compress ML models.
All these resources will help you cultivate key skills that businesses and companies care about the most.
SPONSOR US
Advertise to 600k+ data professionals
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.