Correlation doesn’t suggest causation. It seems, nonetheless, that with some easy ingenious tips one can, probably, unveil causal relationships inside commonplace observational knowledge, with out having to resort to costly randomised management trials.
This put up is focused in the direction of anybody making knowledge pushed choices. The primary takeaway message is that causality could also be doable by understanding that the story behind the information is as vital as the information itself.
By introducing Simpson’s and Berkson’s Paradoxes, conditions the place the result of a inhabitants is in battle with that of its cohorts, I shine a lightweight on the significance of utilizing causal reasoning to establish these paradoxes in knowledge and keep away from misinterpretation. Particularly I introduce causal graphs as a way to visualise the story behind the information level out that by including this to your arsenal you might be prone to conduct higher analyses and experiments.
My final goal is to whet your urge for food to discover extra on causality, as I consider that by asking knowledge “Why?” it is possible for you to to transcend correlation calculations and extract extra insights, in addition to keep away from widespread misjudgement pitfalls.
Word that all through this light intro I don’t use equations however reveal utilizing accessible intuitive visuals. That mentioned I present assets so that you can take the next step in including Causal Inference to your statistical toolbox so that you could be get extra worth out of your knowledge.
The Period of Knowledge Pushed Determination Making
In [Deity] We Belief, All Others Deliver Knowledge! — William E. Deming
On this digital age it’s common to place lots of religion in knowledge. However this raises an missed query: Ought to we belief knowledge by itself?
Judea Pearl, who is taken into account the godfather of Causality, articulated finest:
“The gathering of knowledge is as vital as the knowledge itself “ — Judea Pearl
In different phrases the story behind the information is as vital as the information itself.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.37.02 PM.png)
This manifests in a rising consciousness of the significance of figuring out bias in datasets. By the tip of this put up I hope that you’ll admire that causality pertains the elemental instruments to finest categorical, quantify and try to appropriate for these biases.
In causality introductions it’s customary to reveal why “correlation doesn’t suggest causation” by highlighting limitations of affiliation evaluation as a consequence of spurious correlations (e.g, shark assaults 🦈 and ice-cream gross sales 🍦). In an try to cut back the size of this put up I defer this facet to an older one among mine. Right here I deal with two thoughts boggling paradoxes 🤯 and their decision by way of causal graphs to make the same level.
Paradoxes in Evaluation
To grasp the significance of the story behind the information we’ll look at two counter-intuitive (however nonetheless true) paradoxes that are classical conditions of information misinterpretation.
Within the first we think about a scientific trial wherein sufferers are given a remedy and that leads to a well being rating. Our goal is to evaluate the common affect of elevated remedy to the well being end result. For pedagogical functions in these examples we assume that samples are consultant (i.e, the pattern dimension shouldn’t be a problem) and that variances in measurements are minimal.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.37.16 PM-1024x872.png)
Within the determine above we study that on common growing the remedy seems to be helpful because it leads to a greater end result.
Now we’ll colour code by age and gender groupings and look at how the remedy will increase impacts every cohort.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.37.30 PM-1024x871.png)
Observe any cohort (e.g, “Women” representing younger females) and also you instantly realise that enhance in remedy seems antagonistic.
What’s the conclusion of the examine? On the one hand growing the remedy seems to be higher for the inhabitants at massive, however when analyzing gender-age cohorts it appears disadvantageous. That is Simpson’s Paradox which can be acknowledged:
“Traits can exist in subgroups however reverse for the entire”
Beneath we’ll resolve this paradox utilizing causality instruments, however beforehand let’s discover one other fascinating one, which additionally examines made up knowledge.
Think about that we quantify for the final inhabitants their attractiveness and the way gifted they’re as on this determine:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.39.45 PM.png)
We discover no obvious correlation.
Now we’ll deal with an uncommon subset — well-known individuals:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.37.52 PM.png)
Right here we clearly see an anti-correlation that doesn’t exist within the basic inhabitants.
Ought to we conclude that Expertise and Attractiveness are unbiased variables as per the primary plot of the final inhabitants or that they’re correlated as per that of celebrities?
That is Berkson’s Paradox the place one inhabitants has a trait pattern that one other lacks.
Whereas an algorithm would establish these correlations, resolving these paradoxes requires a full understanding of the context which usually shouldn’t be fed to a pc. In different phrases with out understanding the story behind the information outcomes could also be misinterpreted and incorrect conclusions could also be inferred.
Mastering identification and backbone these paradoxes is a crucial first step to elevating one’s analyses from correlations to causal inference.
Whereas these easy examples could also be defined away logically, for the needs of studying causal instruments within the subsequent part I’ll introduce Causal Graphs.
Causal Graphs— Visualising The Story Behind The Knowledge
“[From the Simpson’s and Berkson’s Paradoxes we learn that] sure choices can’t be made based mostly on the premise of knowledge alone, however as a substitute rely upon the story behind the information. … Graph Idea allows these tales to be conveyed” — Judea Pearl
Causal graph fashions are probabilistic graphical fashions used to visualise the story behind the information. They’re maybe one of the crucial highly effective instruments for analysts that isn’t taught in most statistics curricula. They’re each elegant and extremely informative. Hopefully by the tip of this put up you’ll admire it when Judea Pearl says that that is the lacking vocabulary to speak causality.
To grasp causal graph fashions (or causal graphs for brief) we begin with the next illustration of an instance undirected graph with 4 nodes/vertices and three edges.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.38.09 PM-1024x623.png)
Every node is a variable and the perimeters talk “who’s associated to whom?” (i.e, correlations, joint possibilities).A directed graph is one wherein we add arrows as on this determine.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.38.17 PM-1024x943.png)
A directed edge communicates “who listens to whom?” which is the essence of causation.
On this particular instance you possibly can discover a cyclical relationship between the C and D nodes.A helpful subset of directed graphs are the directed acyclic graphs (DAG), which don’t have any cycles as within the subsequent determine.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.38.25 PM-1024x949.png)
Right here we see that when ranging from any node (e.g, A) there isn’t a path that will get again to it.
DAGs are the go-to selection in causality for simplicity as the truth that parameters should not have suggestions extremely simplifies the circulation of knowledge. (For mechanisms which have suggestions, e.g temporal methods, one might contemplate rolling out nodes as a perform of time, however that’s past the scope of this intro.)
Causal graphs are highly effective at conveying the trigger/impact relationships between the parameter and therefore how knowledge was generated (the story behind the information).
From a sensible standpoint, graphs allow us to know which parameters are confounders that should be managed for, and, as vital, which to not management for, as a result of doing so causes spurious correlations. This will likely be demonstrated under.
The follow of trying to construct a causal graph allows:
- Design of higher experiments.
- Draw causal conclusions (transcend correlations via representing interventions, counterfactuals and encoding conditional independence relationships; all past the scope of this put up).
To additional encourage the utilization of causal graph fashions we’ll use them to resolve the Simpson’s and Berkson’s paradoxes launched above.
💊 Causal Graph Decision of Simpson’s Paradox
For simplicity we’ll look at Simpson’s paradox specializing in two cohorts, female and male adults.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.38.37 PM-1024x864.png)
Inspecting this knowledge we are able to make three statements about three variables of curiosity:
- Gender is an unbiased variable (it doesn’t “take heed to” the opposite two)
- Remedy is dependent upon Gender (as we are able to see, on this setting the extent given is dependent upon Gender — ladies have been given, for some purpose, a better dosage.)
- End result is dependent upon each Gender and Remedy
In accordance with these we are able to draw the causal graph as the next:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.38.59 PM-1024x876.png)
Discover how every arrow contributes to speak the statements above. As vital, the shortage of an arrow pointing into Gender conveys that it’s an unbiased variable.
We additionally discover that by having arrows pointing from Gender to Remedy and End result it’s thought of a widespread trigger between them.
The essence of the Simpson’s paradox is that though the End result is effected by adjustments in Remedy, as anticipated, there’s additionally a backdoor path circulation of knowledge by way of Gender.
As you will have guessed by this stage, the answer to this paradox is that the widespread trigger Gender is a confounding variable that must be managed.
Controlling for a variable, when it comes to a causal graph, means eliminating the connection between Gender and Remedy.
This can be achieved in two manners:
- Pre knowledge assortment: Establishing a Randomised Management Trial (RCT) wherein contributors will likely be given dosage no matter their Gender.
- Publish knowledge assortment: E.g, on this made up state of affairs the information has already been collected and therefore we have to cope with what’s known as Observational Knowledge.
In each pre- and post- knowledge assortment the elimination of the Remedy dependency of Gender (i.e, controlling for the Gender) could also be achieved by modifying the graph such that the arrow between them is eliminated as within the following:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.39.21 PM-1024x908.png)
Making use of this “graphical surgical procedure” signifies that the final two statements should be modified (for comfort I’ll write all three):
- Gender is an unbiased variable
- Remedy is an unbiased variable
- End result is dependent upon Gender and Remedy (however with no backdoor path).
This permits acquiring the causal relationship of curiosity : we are able to assess the direct affect of modification Remedy on the End result.
The method of controlling for a confounder, i.e manipulation of the information technology course of, is formally known as making use of an intervention. That’s to say we’re not passive observers of the information, however we’re taking an energetic position in modification it to evaluate the causal affect.
How is that this manifested in follow?
Within the case of RCTs the researcher wants to manage for vital confounding variables. Right here we restrict the dialogue to Gender (however in actual world settings you possibly can think about different variables comparable to Age, Social Standing and anything that is likely to be related to at least one’s well being).
RCTs are thought of the golden commonplace for causal evaluation in lots of experimental settings because of its follow of confounding variables. That mentioned, it has many setbacks:
- It might be costly to recruit people and could also be sophisticated logistically
- The intervention below investigation might not be bodily doable or moral to conduct (e.g, one can’t ask randomly chosen individuals to smoke or not for ten years)
- Synthetic setting of a laboratory — not a real pure habitat of the inhabitants.
Observational knowledge however is way more available within the business and academia and therefore less expensive and might be extra consultant of precise habits of the people. However as illustrated within the Simpson’s diagram it could have confounding variables that should be managed.
That is the place ingenious options developed within the causal group prior to now few many years are making headway. Detailing them are past the scope of this put up, however I briefly point out methods to study extra on the finish.
To resolve for this Simpson’s paradox with the given observational knowledge one
- Calculates for every cohort the affect of the change of the remedy on the result
- Calculates a weighted common contribution of every cohort on the inhabitants.
Right here we’ll deal with instinct, however in a future put up we’ll describe the maths behind this answer.
I’m positive that many analysts, similar to myself, have seen Simpson’s at some point of their knowledge and hopefully have corrected for it. Now you realize the identify of this impact and hopefully begin to admire how causal instruments are helpful.
That mentioned … being confused at this stage is OK 😕
I’ll be the primary to confess that I struggled to know this idea and it took me three weekends of deep diving into examples to internalised it. This was the gateway drug to causality for me. A part of my course of to understanding statistics is taking part in with knowledge. For this goal I created an interactive internet software hosted in Streamlit which I name Simpson’s Calculator 🧮. I’ll write a separate put up for this sooner or later.
Even in case you are confused the primary takeaways of Simpson’s paradox is that:
- It’s a scenario the place traits can exist in subgroups however reverse for the entire.
- It might be resolved by figuring out confounding variables between the remedy and the result variables and controlling for them.
This raises the query — ought to we simply management for all variables apart from the remedy and end result? Let’s preserve this in thoughts when resolving for the Berkson’s paradox.
🦚 Causal Graph Decision of Berkson’s Paradox
As within the earlier part we’re going to clarify statements about how we consider the information was generated after which draw these in a causal graph.
Let’s look at the case of the final inhabitants, for comfort I’m copying the picture from above:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.39.45 PM.png)
Right here we perceive that:
- Expertise is an unbiased variable
- Attractiveness is an unbiased variable
A causal graph for that is fairly easy, two nodes with out an edge.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.39.57 PM-1024x417.png)
Let’s look at the plot of the superstar subset.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.37.52 PM.png)
The cheeky perception from this mock knowledge is that the extra doubtless one is engaging the much less they should be gifted to be a star. Therefore we are able to deduce that:
- Expertise is an unbiased variable
- Attractiveness is an unbiased variable
- Celeb variable is dependent upon each Expertise and Attractiveness variables. (Think about this variable is boolean as in: true for celebrities or false for not).
Therefore we are able to draw the causal graph as:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.40.26 PM-1024x994.png)
By having arrows pointing into it Celeb is a collider node between Expertise and Attractiveness.
Berkson’s paradox is the truth that when controlling for celebrities we see an fascinating pattern (anti correlation between Attractiveness and Expertise) not seen within the basic inhabitants.
This may be visualised within the causal graph that by confounding for the Celeb parameter we’re making a spurious correlation between the in any other case unbiased variables Expertise and Attractiveness. We will draw this as the next:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.40.40 PM-1024x945.png)
The answer of this Berkson’s paradox must be obvious right here: Expertise and Attractiveness are unbiased variables on the whole, however by controlling for the collider Celeb node causes a spurious correlation within the knowledge.
Let’s examine the decision of each paradoxes:
- Resolving Simpson’s Paradox is by controlling for widespread trigger (Gender)
- Resolving Berkson’s Paradox is by not controlling for the collider (Celeb)
The following determine combines each insights within the type of their causal graphs:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.40.50 PM-1024x483.png)
The primary takeaway from the decision of those paradoxes is that controlling for parameters requires a justification. Frequent causes must be managed for however colliders shouldn’t.
Although that is widespread data for individuals who examine causality (e.g, Economics majors), it’s unlucky that the majority analysts and machine studying practitioners aren’t conscious of this (together with myself in 2020 after over 15 years of research and predictive modelling expertise).
“Oddly, statisticians each over- and underrate the significance of confounders“ — Judea Pearl
Abstract
The primary takeaway from this put up is that the story behind the information is as vital as the information itself.
Appreciating this can allow you to keep away from end result misinterpretation as spurious correlations and, as demonstrated right here, in Simpson’s and Berskon’s paradoxes.
Causal Graphs are an important device to visualise the story behind the information. By utilizing them to resolve for the paradoxes we learnt that controlling for variables requires justification (widespread causes ✅, colliders ⛔️).
For these desirous about taking the subsequent step of their causal journey I extremely counsel mastering Simpson’s paradox. One smart way is by taking part in with knowledge. Be happy to take action with my interactive “Simpson-calculator” 🧮.
Liked this put up? 💌 Be part of me on LinkedIn or ☕ Purchase me a espresso!
Credit
Until in any other case famous, all photographs have been created by the writer.
Many because of Jim Parr, Will Reynolds, Hedva Kazin and Betty Kazin for his or her helpful feedback.
Questioning what the next step must be in your causal journey? Try my new article on mastering Simpson’s Paradox — you’ll by no means take a look at knowledge the identical method. 🔎
Helpful Sources
Right here I present assets that I discover helpful in addition to a purchasing listing of matters for novices to study.
📚 Books
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.41.04 PM-685x1024.png)
- The Guide of Why — well-liked science studying (NY Occasions degree)
- Causal Inference in Statistics A Primer — glorious quick technical guide (web site)
- Causal Inference and Discovery in Python by Aleksander Molak (Packt, github) — clearly defined with python functions 🐍.
- What If? — a cohesive presentation of ideas of, and strategies for, causal inference (web site, github)
- Causal Inference The Mixtape — Social Science centered utilizing Python, R and Strata (web site, assets, mooc)
- Counterfactuals and Causal Inference — Strategies and Rules (Social Science centered)
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.41.18 PM-1024x317.png)
This listing is much from complete, however I’m glad so as to add to it if anybody has options (please point out why the guide stands out from the pack).
🔏 Programs
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.41.30 PM-1024x741.png)
There are most likely a number of programs on-line. I really like the 🆓 one among Brady Neil bradyneal.com/causal-inference-course.
- Clearly defined
- Covers many points
- Thorough
- Gives memorable examples
- F.R.E.E
One paid course 💰 that’s focused to practitioners is Altdeep.
💾 Software program
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.41.40 PM-1024x769.png)
This listing is much from complete as a result of the area is quickly rising:
Causal Wizard app even have an article about Causal Diagram instruments.
🐾 Steered Subsequent Steps In The Causal Journey
Right here I spotlight a listing of matters which I might have discovered helpful once I began my learnings within the area. If I’m lacking something I’d be greater than glad to get suggestions and including. I daring face those which have been briefly mentioned right here.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/Screenshot-2025-02-14-at-1.41.53 PM-845x1024.png)
- Pearl’s Causal Hierarchy of seeing, doing and imagining (determine above)
- Observational knowledge vs. Randomised Management Trials
- d-separation, widespread causes, colliders, mediators, instrumental variables
- Causal Graphs
- Structural Causal Fashions
- Assumptions: Ignorability, SUTVA, Consistency, Positivity
- “Do” Algebra — assessing affect on cohorts by intervention
- Counterfactuals — assessing affect on people by evaluating actual outcomes to potential ones
- The elemental downside of causality
- Estimand, Estimator, Estimate, Identifiability — relating causal definitions to observable statistics (e.g, conditional possibilities)
- Causal Discovery — discovering causal graphs with knowledge (e.g, Markov Equivalence)
- Causal Machine Studying (e.g, Double Machine Studying)
For completeness it’s helpful to know that there are completely different streams of causality. Though there’s lots of overlap it’s possible you’ll discover that strategies differ in naming conference as a consequence of growth in numerous fields of analysis: Laptop Science, Social Sciences, Well being, Economics
Right here I used definitions principally from the Pearlian perspective (as developed within the area of pc science).
The Story Behind This Publish
This narrative is a results of two examine teams that I’ve performed in a earlier position to get myself and colleagues to study causality, which I felt lacking in my talent set. If there’s any curiosity I’m glad to put in writing a put up concerning the examine group expertise.
This intro was created because the one I felt that I wanted once I began my journey in causality.
Within the first iteration of this put up I wrote and introduced the constraints of spurious correlations and Simpson’s paradox. The primary purpose for this revision to deal with two paradoxes is that, whereas most causality intros deal with the constraints of correlations, I really feel that understanding the idea of justification of confounders is vital for all analysts and machine studying practitioners to concentrate on.
On September fifth 2024 I’ve introduced this content material in a contributed speak on the Royal Statistical Society Annual Convention in Brighton, England (summary hyperlink).
Sadly there is no such thing as a recording however there are of earlier talks of mine:
The slides can be found at bit.ly/start-ask-why. Presenting this materials for the primary time at PyData World 2021