Identify Drift Using Proxy-labeling
An intuitive way to detect drift.
An intuitive way to detect drift.

TODAY'S ISSUE
Almost all real-world ML models gradually degrade in performance due to a drift in feature distribution:
It is a serious problem because we trained the model on one distribution, but it is being used to generate predictions on another distribution in production.
The following visual summarizes a technique I often use to detect drift:
There are four steps:
label=1 column to the old dataset and label=0 column to the current dataset.The choice of the classification model could be arbitrary, but you should be able to determine feature importance.
Thus, I personally prefer a random forest classifier because it has an inherent mechanism to determine feature importance:
That said, it is not necessary to use a random forest.
Techniques like shuffle feature importance (βwhich we discussed hereβ) illustrated below can be used as well on typical classification models:
Moving onβ¦
If the feature importance values suggest that there are features with high feature importance, this means that those features have drifted.
Why?
This is because if some features can reliably distinguish between the two versions of the dataset, then it is pretty likely that their distribution corresponding to label=1 and label=0 (conditional distribution) are varying.
This idea makes intuitive sense as well.
Of course, this is not the only technique to determine drift.
Autoencoders can also help. βWe discussed them hereβ in a recent newsletter issue.
π Over to you: What are some other ways you use to determine drift?
Modern neural networks being trained today are highly misleading.
They appear to be heavily overconfident in their predictions.
For instance, if a model predicts an event with a 70% probability, then ideally, out of 100 such predictions, approximately 70 should result in the event occurring.
However, many experiments have revealed that modern neural networks appear to be losing this ability, as depicted below:
βCalibrationβ solves this.
A model is calibrated if the predicted probabilities align with the actual outcomes.
Handling this is important because the model will be used in decision-making and an overly confident can be fatal.
To exemplify, say a government hospital wants to conduct an expensive medical test on patients.
To ensure that the govt. funding is used optimally, a reliable probability estimate can help the doctors make this decision.
If the model isn't calibrated, it will produce overly confident predictions.
There has been a rising concern in the industry about ensuring that our machine learning models communicate their confidence effectively.
Thus, being able to detect miscalibration and fix is a super skill one can possess.
βLearn how to build well-calibrated models in this crash course β
The list could go on since almost every major tech company I know employs graph ML in some capacity.
Becoming proficient in βgraph MLβ now seems to be far more critical than traditional deep learning to differentiate your profile and aim for these positions.
A significant proportion of our real-world data often exists (or can be represented) as graphs:
The field of βgraph neural networks (GNNs)β intends to fill this gap by extending deep learning techniques to graph data.
Learn sophisticated graph architectures and how to train them on graph data in βthis crash courseββ β