Introduction
“Because” is possibly one of the most powerful words in business decision-making.
- “Our customer satisfaction improved because we introduced personalized recommendations.”
- “The energy consumption dropped because of the new efficiency standards implemented.”
Backing any observation/insights with causality gives so much ability to confidently use the word “because” in business/regular discussions.
Identifying these causal relationships is vital because these relationships typically require an additional inspection that goes beyond just the typical correlation analysis, which almost anyone can do these days.
Thus, in this two-part article series, we shall dive into the details of causality and understand some of the most widely used techniques at the forefront of business decision-making, so that you can add value to your job and projects with your diversified skill set.
If you aspire to make valuable contributions to your data science job, this series will be super helpful.
A motivating example
It is well known that while correlation can show a relationship between two variables, it doesn’t imply that one causes the other.
For instance:
- Studying correlation is likely to suggest that around the whole year, ice cream sales and AC sales are correlated. But this doesn’t mean eating ice cream causes people to buy air conditioners or vice versa. Both are influenced by a third factor – temperature.
If we were able to accurately establish causality, it would immensely optimize the company’s operations.
In this series, we shall cover four statistical tools that provide a scientific basis for using the word “because.”
Only by rigorously establishing causality can you justifiably use the word “because.”
My experience with Causality
In 2021, I was a data scientist at Mastercard and mentored an intern. Causal inference was not something me and my time had deep experience in so we were still exploring.
Here's how we decided to approach it, but first, let me give you some context.
Mastercard handles millions of transactions per day. Every transaction goes through a fraud-detection model. Post the authentication phase, whether the transaction will be approved is decided based on the binary outcome of this model:
- Fraud $\rightarrow$ reject the transaction.
- Non-fraud $\rightarrow$ approve the transaction.
Now, once a label has been assigned to a transaction, it becomes a fact in this universe. Of course, the prediction may be wrong, but it has become a fact, and we cannot change it.
In the case of non-fraud transactions, the way we (Mastercard) ascertained whether the model made the correct prediction was determined based on whether the cardholder reached out to his bank or not.
So, let's say a transaction happened through your card right now, but you did not do it. However, the model classified it as non-fraud.
You would contact your bank, claiming you did not do that transaction. Of course, the bank will block your card immediately, but they may not trust you because you might be committing a friendly fraud.
Assuming it is not a case of friendly fraud, Mastercard waits for about 30-45 days to know (receive the label) whether a transaction classified as non-fraud was actually a fraud or not. Banks usually take this long to get back to Mastercard with a true label.
In other words, the feedback exists far into the future.
As we would see ahead, a big part of causal inference also revolves around counterfactual learning. As the name suggests:
- Counter $\rightarrow$ Refers to something that is opposite or different.
- Factual $\rightarrow$ Relates to actual events or facts that have occurred.
- Learning $\rightarrow$ Involves acquiring knowledge or understanding through study and experience.
Counterfactual learning involves analyzing what would have happened under different circumstances. It helps us understand the impact of actions had we taken a different decision in the past.
For example, in the context of fraud detection:
- Counterfactual $\rightarrow$ What if the transaction had been classified as fraud instead of non-fraud?
- Learning $\rightarrow$ Gaining insights from these counterfactual scenarios to improve future decision-making and model accuracy.
We wanted to study this:
If a transaction was originally classified as fraud, what would have happened if it was classified as non-fraud instead? Would the cardholder have contacted the bank, claiming it was a fraud?
As you may have already understood, the most tricky thing about this is that we never get to know about the alternative reality. If the transaction has been classified as fraud, it becomes a fact, and now we cannot go back in time, reverse that decision, and observe the universe again.
This is why addressing questions of causality necessitates using rigorous -statistical tools, which we will explore in the article ahead.
Potential Outcome Model
The historical utility of causality estimations comes from the medical industry, where causality was primarily used for treatment evaluation.
More specifically, it was used to evaluate whether a specific treatment would cause sick patients' health to improve.
- Expose some patients to a treatment.
- Keep the other patients unexposed.
- Measure the difference in outcome between the two groups while ensuring the overall conditions stayed the same.
So, before jumping to the core causality techniques, we need to understand a commonly used framework to analyze causality.
It's called the Potential Outcome Model.
Let’s define some notations before we proceed ahead:
Read the full article
Sign up now to read the full article and get access to all articles for paying subscribers only.
Join today!