Pearl Do-Calculus
Why it matters
You do not always need an experiment to find a cause. Sometimes the effect of an action is already sitting inside data you collected without ever intervening — and sometimes it provably is not, no matter how the numbers are sliced. Do-calculus is the rulebook that tells you which case you’re in, and exactly which variables to adjust for.
For example: a new drug appears to lower mortality in hospital records, but sicker patients were also the ones who got it. The fix everyone reaches for is to “control for” the confounders — yet which ones? Adjust for the right set and the bias clears; adjust for the wrong variable and you don’t shrink the bias, you manufacture it. Do-calculus turns “control for the confounders” from a hand-wave into a decision you can check: it reads the causal graph and returns the precise set to adjust for — or a proof that no set in your data will do, and an experiment is the only honest answer.
- What it reveals. Whether the effect of an action is identifiable from the data you already have — recoverable by adjusting for the right variables — or provably not, and exactly which variables make up that adjustment set.
- How it changes the read. “Control for confounders” stops being a reflex and becomes a verdict: this specific set, justified by the graph — or none, and here is the assumption or experiment you’d need instead.
- When to foreground it. Any effect-of-an-action claim resting on observational data — a treatment, a policy, a product change — where someone has reached for a regression and “adjusted for” a list of covariates.
- What you’d miss without it. That adjusting for the wrong variable — a collider, or a mediator you should have left alone — adds bias rather than removing it; the intuitive list of “things to control for” is often the list that breaks the estimate.
- Where it misleads. The verdict is only as good as the graph it runs on. A wrong graph yields a clean-looking adjustment set that is confidently, precisely wrong — the algebra is airtight, the assumptions underneath it may not be.
How to invoke it in Ora
You have an effect-of-an-action question on observational data — does X actually cause Y, and can the effect even be recovered without an experiment — and you want the adjustment decided by rule, not by which covariates seemed reasonable.
Do-calculus is the computational step inside a Causal DAG analysis: it runs on the graph that analysis builds, so you invoke the analysis and let it carry you to the identification step. Describe the variables, the claim, and what you measured versus what you couldn’t, and ask:
“Build the causal graph and tell me whether our onboarding change actually drives retention or company size is confounding it — can we identify the effect from the data we have, and what exactly do we adjust for?”
The Causal DAG analysis first locks the rung and draws the graph (its foundational step), then hands that structure to do-calculus, which decides the identifiability verdict: is the effect recoverable, by the back-door criterion (adjust for the variables that block every confounding path), the front-door criterion (when the confounder is unmeasured but the mechanism in between is), or a do-calculus rule — and what the adjustment set is. If nothing identifies it, it says so, and names what would.
One thing to know: the words do-calculus, back-door, front-door, confounder, causal graph, DAG, or Pearl are what route you here. The more honestly you mark which variables you actually observed and which you couldn’t, the sharper the verdict — identifiability turns entirely on what is and isn’t measurable.
Bring the right material: the variables and their plausible cause-and-effect relationships, and crucially the line between what you can measure and what you can’t. The front-door route exists precisely for the case where the confounder is unmeasured — but only if you can measure the mechanism it runs through.
One thing Ora won’t do: hand back an adjustment set as if it were settled fact. Every identifying expression is stamped conditional on this graph — change an arrow and the verdict can flip — and it will not adjust for a collider or a mis-chosen mediator to force a clean answer.
How it works
A new drug seems to lower mortality. You have the records, not a trial. The obvious move is to “control for” the things that differ between who got the drug and who didn’t — age, severity, the rest. But here is the trap that catches careful people: controlling for a variable is not a neutral act of caution. Pick the right set and you strip out the confounding and see the true effect. Pick the wrong variable and you don’t reduce the bias — you create it out of nothing. So the real question was never “should I adjust?” It was “adjust for what, exactly — and how would I know?”
There turn out to be two clean answers, and a third deeper one underneath them.
The first is the back-door criterion. A confounder is a hidden common cause sitting behind both the drug and the outcome — sicker patients are both more likely to get the drug and more likely to die — and it sneaks influence in through a “back door,” a path that runs into the drug from behind rather than out from it toward the outcome. The rule is exact: adjust for a set of variables that blocks every back-door path, and nothing more. The “nothing more” is the part people miss. There is a treacherous kind of variable called a collider — something that two arrows point into — and a collider blocks its fake path on its own, until you “control for” it, at which point it springs open a phantom relationship between things that were never related. The instinct to throw every available variable into the regression is exactly how you condition on a collider and invent an effect. The back-door criterion is the discipline of blocking what must be blocked and leaving the rest strictly alone.
The second answer is subtler, and it rescues the case that looks hopeless: what if the confounder can’t be measured at all? Pearl’s worked case is smoking and cancer. Suppose some genetic factor makes people both more likely to smoke and more likely to get cancer — and you can’t measure it. The back door is jammed; there’s no set of observed variables to adjust for. But smoking doesn’t reach cancer by magic — it works through something, the tar deposited in the lungs. And tar you can measure. The front-door criterion says: if you can measure the mechanism sitting between cause and effect, you can chain two effects that are each clean on their own — smoking’s effect on tar, and tar’s effect on cancer — and multiply your way to the true effect of smoking on cancer, straight past the confounder you could never see. You recover a causal answer from data that, by the old rules, had no business yielding one.
Underneath both criteria sits the engine that proves them: the do-calculus itself. Pearl wrote down three formal rewrite rules — three licensed moves for shuffling a graph — and the quantity they operate on is the effect of doing, written P(Y | do(X)): the probability of the outcome if you reach in and set the cause, as an experiment would, rather than merely watching it vary. Each rule turns an expression about doing into one a little closer to plain observation, but only when the graph permits — when a specific blocking condition holds in the graph. Apply them in sequence and one of two things happens. Either you grind the interventional quantity all the way down to something you can estimate from ordinary observational data — the effect is identified, and the back-door and front-door criteria fall out as named special cases — or you get stuck, and the calculus proves you’re stuck: no sequence of legal moves will get there, the effect is unidentifiable from this graph, and only new data or an experiment will unblock it. That is the quiet revolution. Whether a cause can be known without an experiment stops being a matter of judgment and becomes something you can settle by manipulating a diagram with three rules — and the diagram tells you not just the answer but, when the answer is no, that the answer is provably no.
Framework & implementation
This section uses Ora’s own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.
Pipeline execution
Pearl do-calculus is the computational lens of the Causal DAG analysis — lens_type: protocol in its lens file, and one of the mode’s two required lenses. Where its partner pearl-causal-graphs is foundational: true and builds the structure (locks the rung, draws the DAG, classifies the variables), do-calculus computes on it. It sits in the mode’s ANALYTICAL PERSPECTIVES block under “always loaded.” The mode runs at Gear 4, Ora’s most thorough setting — a Depth analyst and a Breadth analyst work the causal question in parallel, critique each other, and revise.
Where the lens engages. It activates on its Detection Signals — the analysis has a DAG and an interventional query P(Y | do(X)) but only observational data; a randomized intervention on X is unavailable yet identification from observation may be possible if the right adjustment set exists; a confounder is suspected and the question is whether the observed covariates suffice; a complex graph offers several candidate adjustment sets and the formal criterion is needed to choose; or a previously claimed effect was computed by adjustment without the back-door criterion ever being checked, and an audit is due.
What it produces in the analysis. This lens owns one output section: the Identifiability verdict. Taking the DAG specification and the Confounder / mediator / collider classification that pearl-causal-graphs supplies, it runs its Application Steps as a fixed escalation. It attempts back-door adjustment first — search for a set of observed variables that contains no descendant of X and blocks every back-door path from X to Y; if found, return that adjustment set. Failing that (typically because the confounder is unobserved), it attempts front-door adjustment — find a mediator that intercepts all directed paths from X to Y, with no open back-door into it and its back-doors to Y blocked by X. Failing both, it attempts a full do-calculus derivation — apply the three rules iteratively to reduce P(Y | do(X)) to an observational expression. If no derivation succeeds it returns an unidentifiable verdict, naming the structural obstruction (e.g., unobserved confounding with no available adjustment set) and the auxiliary data — an instrument, a quasi-experiment, an added measurement — that would unblock it. So the verdict the reader sees is always one of: identifiable, by criterion C, adjustment set Z; or not identifiable, obstruction O, and here is what it would take. That feeds the mode’s Intervention or counterfactual answer (the effect can now be estimated) and is carried into the Assumption inventory, since the verdict is only as strong as the graph that produced it.
Cross-adversarial evaluation. At Gear 4 each analyst’s reading is critiqued by the other, which catches this lens’s signature failures — keyed to its Critical Questions and Common Failure Modes: putting a descendant of X into the adjustment set (adjustment-for-descendant); conditioning on a collider and opening a non-causal path (collider-bias-by-conditioning); applying front-door adjustment when the mediator captures only part of the effect (partial-mediation-mistaken-for-frontdoor); computing a clean “identifying” expression on a graph that quietly omits a real confounder (unidentifiable-disguised-as-identified); and applying an adjustment formula where some treatment-covariate combinations never occur in the data (positivity-violation). The evaluator presses the hardest check of all: is the analyst defending the estimate by citing the derivation, when the thing that actually needs defending is the graph?
Honesty discipline. The mode carries an Assumption inventory ordered most-fragile-first, and for an identifiability verdict the most fragile assumption is almost always the same one: that the DAG is correct, and in particular that there is no unmeasured confounder the graph left out — the no-unmeasured-confounding assumption does the heaviest lifting, and a back-door set that looks complete on a graph missing one arrow is bias wearing a proof. The lens keeps the two layers strictly separate: the structural claim (the DAG) and the algebraic consequence (the identifying expression) are defended on different grounds, so a watertight derivation is never allowed to borrow credibility for a shaky graph.
What the analysis will not do. It will not adjust for a collider or a mediator-you-shouldn’t to force a clean answer; it will not present an identifying expression as a fact about the world when it is a fact about the graph; and it will not dress an unidentifiable effect as identifiable — when the data cannot recover the effect, it says so and names the experiment or instrument that could.
Origin and evidence
The framework is Judea Pearl’s. The do-operator, the back-door criterion, and the front-door criterion in the form used here were introduced in his “Causal diagrams for empirical research” (Biometrika, 1995); the canonical formal treatment, with the three rules, the proofs, and the worked smoking–tar–cancer derivation, is Causality: Models, Reasoning, and Inference (2000; 2nd ed. 2009, chapters 3–4). The system was later shown to be complete — every identifiable interventional query can be reduced to an observational expression by the three rules, established with the ID algorithm by Shpitser and Pearl (2006) and generalized by Tian and Pearl (2002) — which is what makes an “unidentifiable” verdict a genuine proof rather than a failure to find a trick. The work is part of the same body that won Pearl the 2011 Turing Award and is credited with the causal revolution in statistics, epidemiology, economics, and machine learning; a complementary counterfactual-framework tradition (Hernán and Robins’s Causal Inference: What If) reaches the same adjustment results from a potential-outcomes direction. Do-calculus is the computational partner of the graph-building lens documented in [[Paper — Pearl Causal Graphs and the Ladder of Causation]] — that lens supplies the rung-vocabulary and DAG conventions this protocol runs on; this lens supplies the rules that decide what the graph permits.
Applications and common uses
Do-calculus is the working tool wherever an effect-of-an-action claim is pressed against non-experimental data and someone has to decide what to adjust for — and whether adjusting is even enough.
- Epidemiology and medicine. Its native ground. Choosing the adjustment set that estimates a treatment effect from observational records without (or before) a randomized trial — and, just as important, knowing when no observed set will do and the front-door mediator or an instrument is the only route.
- Economics and policy. “If we raise the minimum wage, employment will…” is an effect-of-an-action claim usually argued from observational data; do-calculus makes the identifying assumptions explicit, names the adjustment set, and is willing to return unidentifiable — which is itself a finding, redirecting the argument to the natural experiment that would settle it.
- Business analytics and experimentation. Deciding whether a feature’s effect on retention or a campaign’s effect on sales is identifiable from logs — and which confounders to condition on if it is — versus whether only an A/B test will settle it, sparing the cost of a confidently biased observational estimate.
- Auditing causal claims. Re-deriving a published or internal estimate to check that the back-door criterion actually held: that no descendant of the treatment crept into the controls, that no collider was conditioned on, that positivity was respected. Many “we controlled for that” claims do not survive the check.
- Machine learning and AI. Identification underpins systems that answer “what if” rather than only “what’s correlated,” and that hold up when the conditions under which data was gathered shift.
In every case the payoff is the same: a precise, justified answer to “adjust for what?” — or an honest, proven “you can’t from this data, and here is what you’d need.”
Failure modes and when not to use it
The lens’s characteristic ways of going wrong are catalogued in its Common Failure Modes — and they share a theme: do-calculus is so good at producing a clean expression that the clean expression gets trusted past its warrant.
- Adjustment-for-descendant. Putting a descendant of the treatment into the adjustment set, which can inject bias rather than remove it. The tell is a control variable with an arrow from X (directly or down a directed path). Remove descendants and re-test the back-door criterion.
- Collider-bias-by-conditioning. Conditioning on a collider, or its descendant, which opens a non-causal path and biases the estimate — the self-inflicted wound of “controlling for everything.” The tell is an adjustment-set variable with incoming arrows from both an X-side and a Y-side ancestor. Drop it; find a different set.
- Partial-mediation-mistaken-for-frontdoor. Applying front-door adjustment when the mediator captures only some of the effect. The tell is a directed path from X to Y that bypasses the mediator. Front-door does not apply; seek a different back-door set or auxiliary data.
- Unidentifiable-disguised-as-identified. Computing an “identifying” expression that is biased because the graph omitted a real confounder. The tell is a domain expert naming a plausible confounder absent from the DAG. Revise the graph (usually a latent variable, drawn as a bidirected edge) and re-attempt identification.
- Positivity-violation. Applying the adjustment formula where some treatment-covariate combinations never appear in the data. The tell is P(X | Z) at or near zero for some realized Z. Restrict to the region of common support and report the restriction.
- DAG-confidence-overflow. Treating the identified expression as a fact about the world rather than about the graph. The tell is an analyst defending the estimate by citing the derivation instead of the graph. Separate the structural assumption from its algebraic consequence and defend each on its own grounds.
When not to reach for it. When you have a real experiment — randomization severs the confounding that identification exists to handle, so the back-door machinery is needed only for edge cases like attrition or mediation. When the question is genuinely just predictive — will X help forecast Y, with no intervention contemplated — there is no do(X) to identify and the calculus answers a question no one asked. And, above all, when the graph itself can’t be defended: do-calculus operates strictly given the DAG, so on a graph built from guesses it will return a precise adjustment set that is precise nonsense. When the structure is unknown, the honest first move is more domain knowledge or causal discovery — not a derivation on a diagram no one can stand behind.
Related
- Causal DAG — the analysis this lens computes inside; it classifies a causal claim, builds the graph, and (through this lens) determines whether the effect is identifiable from the data.
- Pearl Causal Graphs and the Ladder of Causation — the foundational partner: it locks the rung and builds the DAG and the variable roles; do-calculus then computes on that structure to decide what’s identifiable.
- Bayesian Reasoning — the companion for the step after identification: once an effect is identified as a quantity estimable from data, how to update belief about its size as evidence accrues.
- Confirmation Bias — why the graph, not the algebra, is the thing to police: a DAG drawn to make a desired adjustment set come out clean launders motivated reasoning behind an airtight derivation.