Identify Drift Using Proxy-labeling

An intuitive way to detect drift.

πŸ‘‰
Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

TODAY'S ISSUE

TODAY’S DAILY DOSE OF DATA SCIENCE

Identify Drift Using Proxy-labeling

Almost all real-world ML models gradually degrade in performance due to a drift in feature distribution:

It is a serious problem because we trained the model on one distribution, but it is being used to generate predictions on another distribution in production.

The following visual summarizes a technique I often use to detect drift:

There are four steps:

  • Step 1) Consider two versions of the datasetβ€”the old version (one on which the model was trained) and the current version (one on which the model is generating predictions):
  • Step 2) Append a label=1 column to the old dataset and label=0 column to the current dataset.
  • Step 3) Now, train a supervised learning classification model on the combined dataset that predicts the appended column:
  • Step 4) Measure feature importance

The choice of the classification model could be arbitrary, but you should be able to determine feature importance.

Thus, I personally prefer a random forest classifier because it has an inherent mechanism to determine feature importance:

That said, it is not necessary to use a random forest.

Techniques like shuffle feature importance (​which we discussed here​) illustrated below can be used as well on typical classification models:

Moving on…

If the feature importance values suggest that there are features with high feature importance, this means that those features have drifted.

Why?

This is because if some features can reliably distinguish between the two versions of the dataset, then it is pretty likely that their distribution corresponding to label=1 and label=0 (conditional distribution) are varying.

  • If there are distributional differences, the model will capture them.
  • If there are no distributional differences, the model will struggle to distinguish between the classes.

This idea makes intuitive sense as well.

Of course, this is not the only technique to determine drift.

Autoencoders can also help. ​We discussed them here​ in a recent newsletter issue.

πŸ‘‰ Over to you: What are some other ways you use to determine drift?

CRASH COURSE (37 MINS)

Generate true probabilities with model calibration​

Modern neural networks being trained today are highly misleading.

They appear to be heavily overconfident in their predictions.

For instance, if a model predicts an event with a 70% probability, then ideally, out of 100 such predictions, approximately 70 should result in the event occurring.

However, many experiments have revealed that modern neural networks appear to be losing this ability, as depicted below:

  • The average confidence of LeNet (an old model) closely matches its accuracy.
  • The average confidence of the ResNet (a relatively modern model) is substantially higher than its accuracy.

​Calibration​ solves this.

A model is calibrated if the predicted probabilities align with the actual outcomes.

Handling this is important because the model will be used in decision-making and an overly confident can be fatal.

To exemplify, say a government hospital wants to conduct an expensive medical test on patients.

To ensure that the govt. funding is used optimally, a reliable probability estimate can help the doctors make this decision.

If the model isn't calibrated, it will produce overly confident predictions.

There has been a rising concern in the industry about ensuring that our machine learning models communicate their confidence effectively.

Thus, being able to detect miscalibration and fix is a super skill one can possess.

​Learn how to build well-calibrated models in this crash course β†’

CRASH COURSE (56 MINS)

​Graph Neural Networks

  • Google Maps uses graph ML for ETA prediction.
  • Pinterest uses graph ML (PingSage) for recommendations.
  • Netflix uses graph ML (SemanticGNN) for recommendations.
  • Spotify uses graph ML (HGNNs) for audiobook recommendations.
  • Uber Eats uses graph ML (a GraphSAGE variant) to suggest dishes, restaurants, etc.

The list could go on since almost every major tech company I know employs graph ML in some capacity.

Becoming proficient in ​graph ML​ now seems to be far more critical than traditional deep learning to differentiate your profile and aim for these positions.

A significant proportion of our real-world data often exists (or can be represented) as graphs:

  • Entities (nodes) are connected by relationships (edges).
  • Connections carry significant meaning, which, if we knew how to model, can lead to much more robust models.

The field of ​graph neural networks (GNNs)​ intends to fill this gap by extending deep learning techniques to graph data.

Learn sophisticated graph architectures and how to train them on graph data in ​this crash course​​ β†’

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data in this crash course.
  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
  • Run large models on small devices using Quantization techniques.
  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
  • Learn how to scale and implement ML model training in this practical guide.
  • Learn 5 techniques with implementation to reliably test ML models in production.
  • Learn how to build and implement privacy-first ML systems using Federated Learning.
  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Our newsletter puts your products and services directly in front of an audience that matters β€” thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today β†’


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.