Fiveable

📊Causal Inference Unit 9 Review

QR code for Causal Inference practice questions

9.4 Interventions and do-calculus

9.4 Interventions and do-calculus

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Causal Inference
Unit & Topic Study Guides

Defining interventions

An intervention means you physically set a variable to a particular value, rather than just watching what value it naturally takes. This is the core move that separates causal inference from standard statistics: instead of asking "What is YY when we observe X=xX = x?" you ask "What would YY be if we forced XX to equal xx?"

That distinction matters because observational associations can be driven by confounders, selection effects, or reverse causation. Interventions cut through all of that by breaking the variable free from its usual causes.

Interventional distributions

An interventional distribution, written P(Ydo(X=x))P(Y \mid do(X = x)), describes the probability distribution of YY in a world where XX has been set to xx by external action. Compare this to the observational conditional P(YX=x)P(Y \mid X = x), which reflects the distribution of YY among units that happen to have X=xX = x.

These two quantities can differ dramatically. For example, P(recoverydrug=1)P(\text{recovery} \mid \text{drug} = 1) in hospital records might be low because sicker patients receive the drug more often. But P(recoverydo(drug=1))P(\text{recovery} \mid do(\text{drug} = 1)) captures what would happen if you randomly assigned the drug, removing that selection effect.

Causal vs observational

  • A causal relationship means changing XX directly changes YY. An observational association means XX and YY tend to co-occur, but manipulating XX might not affect YY at all.
  • Confounders are the classic reason these diverge: ice cream sales and drowning rates are correlated (both driven by hot weather), but banning ice cream won't prevent drownings.
  • The whole point of the do-operator is to formalize this distinction so you can reason precisely about when observational data can tell you about causal effects, and when it can't.

Ideal randomized experiments

Randomized experiments are the gold standard for estimating causal effects because randomization breaks the link between the treatment and any confounders.

  • Randomly assigning subjects to treatment vs. control ensures that, in expectation, the groups are identical on all pre-treatment characteristics.
  • This means P(Ydo(X=x))=P(YX=x)P(Y \mid do(X = x)) = P(Y \mid X = x) in the experimental data, so the observational conditional is the causal quantity.
  • Classic examples: clinical trials for drug efficacy, A/B tests for website design changes.

The challenge is that randomized experiments aren't always feasible. That's where the rest of this unit comes in.

Representing interventions

To work with interventions formally, you need notation and graphical tools that make causal assumptions explicit and manipulable.

Intervention notation

The do-operator, written do(X=x)do(X = x), represents an external intervention that forces XX to take value xx. The key expressions to know:

  • P(Ydo(X=x))P(Y \mid do(X = x)) is the interventional distribution of YY when XX is set to xx.
  • P(YX=x)P(Y \mid X = x) is the ordinary observational conditional.
  • The central question of causal identification: Can we express P(Ydo(X=x))P(Y \mid do(X = x)) purely in terms of observational distributions?

Graphical representation

Causal graphs (specifically DAGs) encode your causal assumptions visually:

  • Nodes represent variables.
  • Directed edges (arrows) represent direct causal effects.
  • The graph tells you which variables cause which, and which paths connect treatment to outcome.

When you intervene on XX, you represent this by deleting all incoming edges to XX in the graph. This reflects the fact that the intervention overrides whatever would normally determine XX.

Manipulated graphs

A manipulated graph (also called a mutilated graph) is the DAG you get after deleting incoming edges to the intervened variable. Formally, if you perform do(X=x)do(X = x), the manipulated graph GXG_{\overline{X}} is the original graph with all arrows pointing into XX removed.

Why this matters: the manipulated graph encodes the independence structure of the post-intervention world. You use it to read off which variables are independent of which under the intervention, and that's what drives identification criteria like the back-door and front-door criteria.

Identifiability under interventions

Identifiability asks: given the causal graph and observational data, can you compute the interventional distribution P(Ydo(X=x))P(Y \mid do(X = x))? If yes, the causal effect is identifiable. If not, you need additional data or stronger assumptions.

Back-door criterion

The back-door criterion gives you a sufficient condition for identifying a causal effect by adjusting for confounders.

A set of variables ZZ satisfies the back-door criterion relative to (X,Y)(X, Y) if:

  1. No variable in ZZ is a descendant of XX.
  2. ZZ blocks every path between XX and YY that has an arrow into XX (these are the "back-door paths").

When ZZ satisfies the criterion, the causal effect is given by the back-door adjustment formula:

P(Ydo(X=x))=zP(YX=x,Z=z)P(Z=z)P(Y \mid do(X = x)) = \sum_z P(Y \mid X = x, Z = z) \, P(Z = z)

Example: to estimate the causal effect of smoking on cancer, you might adjust for age and socioeconomic status if those are confounders with no descendant relationship to smoking.

Interventional distributions, Why It Matters: Inference for Two Proportions | Concepts in Statistics

Front-door criterion

The front-door criterion handles cases where there are unmeasured confounders between XX and YY, but there exists a mediator MM such that:

  1. XX blocks all back-door paths from XX to MM.
  2. MM captures the entire causal effect of XX on YY (all directed paths from XX to YY go through MM).
  3. There are no unblocked back-door paths from MM to YY after conditioning on XX.

When these conditions hold, the causal effect is:

P(Ydo(X=x))=mP(M=mX=x)xP(YX=x,M=m)P(X=x)P(Y \mid do(X = x)) = \sum_m P(M = m \mid X = x) \sum_{x'} P(Y \mid X = x', M = m) \, P(X = x')

This is powerful because it lets you identify causal effects even when you can't measure the confounder directly.

Instrumental variables

An instrumental variable (IV) ZZ affects the treatment XX but has no direct effect on the outcome YY, and is independent of unmeasured confounders between XX and YY.

  • The classic example: using proximity to a college as an instrument for years of education when estimating education's effect on income. Living near a college nudges people toward more education, but doesn't directly affect income through other channels (that's the assumption, at least).
  • IVs allow causal effect estimation under unmeasured confounding, but they typically identify a local average treatment effect (LATE) rather than the average treatment effect for the whole population.
  • The key assumptions (relevance, exclusion restriction, independence) must hold for IV estimates to be valid, and they're not testable from data alone.

Intro to do-calculus

Do-calculus, developed by Judea Pearl, is a complete set of rules for transforming expressions involving the do-operator. Its purpose is to convert interventional quantities into expressions that involve only observational distributions, which you can estimate from data.

Formal definition

Do-calculus consists of three inference rules that let you add, remove, or exchange observations and interventions within probability expressions. Each rule has a graphical condition that must be satisfied in a specific manipulated version of the DAG. Together, these rules are complete: if a causal effect is identifiable from the graph, some sequence of these three rules will derive the identifying formula.

Differences from standard probability

Standard probability gives you tools like Bayes' rule and the chain rule for manipulating conditional distributions. But it has no way to distinguish P(YX=x)P(Y \mid X = x) from P(Ydo(X=x))P(Y \mid do(X = x)), because standard probability doesn't encode causal direction.

Do-calculus adds this layer. It operates on expressions that mix observational conditioning and do-operators, and its rules are justified by the causal structure (the DAG), not just by probability axioms. This is what makes it possible to answer causal questions from observational data.

Three rules of do-calculus

Each rule applies when a specific conditional independence holds in a specific manipulated graph. Let XX, YY, ZZ, and WW be disjoint sets of variables in a DAG GG:

Rule 1 (Insertion/deletion of observations):

P(Ydo(X),Z,W)=P(Ydo(X),W)P(Y \mid do(X), Z, W) = P(Y \mid do(X), W)

if YZX,WY \perp Z \mid X, W in the graph GXG_{\overline{X}} (the graph with incoming edges to XX removed).

This says you can ignore an observed variable ZZ if it's independent of YY given XX and WW in the manipulated graph.

Rule 2 (Action/observation exchange):

P(Ydo(X),do(Z),W)=P(Ydo(X),Z,W)P(Y \mid do(X), do(Z), W) = P(Y \mid do(X), Z, W)

if YZX,WY \perp Z \mid X, W in the graph GX,ZG_{\overline{X}, \underline{Z}} (incoming edges to XX removed, outgoing edges from ZZ removed).

This lets you replace an intervention on ZZ with an observation of ZZ under the right conditions.

Rule 3 (Insertion/deletion of actions):

P(Ydo(X),do(Z),W)=P(Ydo(X),W)P(Y \mid do(X), do(Z), W) = P(Y \mid do(X), W)

if YZX,WY \perp Z \mid X, W in the graph GX,Z(S)G_{\overline{X}, \overline{Z(S)}}, where Z(S)Z(S) is the set of nodes in ZZ that are not ancestors of any node in WW in GXG_{\overline{X}}.

This lets you drop an intervention on ZZ entirely when the graphical condition is met.

Applying do-calculus

The practical use of do-calculus is to take a target causal quantity like P(Ydo(X))P(Y \mid do(X)) and, through repeated application of the three rules, reduce it to an expression involving only observational probabilities.

Identifying causal effects

The typical workflow:

  1. Write down the target interventional distribution, e.g., P(Ydo(X=x))P(Y \mid do(X = x)).
  2. Examine the DAG to determine which rules apply.
  3. Apply rules to progressively eliminate do-operators.
  4. If you can reduce the expression to one with no do-operators, the effect is identified, and you have a formula you can estimate from data.

Both the back-door and front-door adjustment formulas can be derived as special cases of do-calculus. The power of do-calculus is that it also handles cases where neither criterion directly applies.

Proving non-identifiability

Sometimes no sequence of do-calculus rules can eliminate all do-operators. In that case, the causal effect is not identifiable from observational data alone given the assumed graph.

  • This is a genuine impossibility result, not just a failure to find the right trick. Completeness of do-calculus guarantees that if the rules can't get you there, no other method based on the same assumptions can either.
  • Non-identifiability tells you that you need either additional data (e.g., from experiments or new measured variables) or stronger assumptions to estimate the effect.
Interventional distributions, Why It Matters: Inference for One Proportion | Concepts in Statistics

Worked examples

A concrete example helps clarify the process. Consider the classic front-door model:

  • UU is an unmeasured confounder affecting both XX and YY.
  • XMYX \to M \to Y, with UXU \to X and UYU \to Y.

To identify P(Ydo(X))P(Y \mid do(X)):

  1. Apply Rule 2 to replace do(X)do(X) with conditioning on XX for the XMX \to M relationship (since UU doesn't confound XMX \to M once you account for the graph structure).
  2. Apply Rule 2 and Rule 3 to handle the MYM \to Y relationship, adjusting for XX to block the back-door path through UU.
  3. Combine the pieces to arrive at the front-door formula.

Working through examples like this is the best way to build fluency with do-calculus. Try deriving the back-door adjustment formula from the three rules as practice.

Interventions with SCMs

Structural Causal Models (SCMs) give you a complete formal package: a DAG for the qualitative causal structure, plus a set of structural equations that specify the quantitative relationships.

Modeling interventions

In an SCM, each variable XiX_i has a structural equation:

Xi=fi(pai,Ui)X_i = f_i(\text{pa}_i, U_i)

where pai\text{pa}_i are the parents of XiX_i in the DAG and UiU_i is an exogenous noise term.

To model an intervention do(Xj=x)do(X_j = x), you replace the equation for XjX_j with the constant Xj=xX_j = x and leave all other equations unchanged. This is the formal counterpart of deleting incoming edges in the graph.

Interventions can also be stochastic: instead of setting XjX_j to a fixed value, you draw it from some specified distribution, replacing the original equation with a new random mechanism.

Truncated factorization

In an SCM without interventions, the joint distribution factors as:

P(x1,x2,,xn)=i=1nP(xipai)P(x_1, x_2, \ldots, x_n) = \prod_{i=1}^{n} P(x_i \mid \text{pa}_i)

Under an intervention do(Xj=x)do(X_j = x), the factor for XjX_j is removed (since XjX_j is now fixed), giving the truncated factorization:

P(x1,,xndo(Xj=x))=ijP(xipai)Xj=xP(x_1, \ldots, x_n \mid do(X_j = x)) = \prod_{i \neq j} P(x_i \mid \text{pa}_i) \bigg|_{X_j = x}

This formula is the algebraic expression of graph mutilation. It directly connects the graphical operation (delete incoming edges) to a probability computation.

Causal effect estimation

With the truncated factorization in hand, you can compute any interventional quantity by summing or integrating over the non-intervened variables.

Common estimands:

  • Average Treatment Effect (ATE): E[Ydo(X=1)]E[Ydo(X=0)]E[Y \mid do(X = 1)] - E[Y \mid do(X = 0)]
  • Effect of Treatment on the Treated (ETT): E[YX=1YX=0X=1]E[Y_{X=1} - Y_{X=0} \mid X = 1]

SCMs also support counterfactual reasoning (e.g., "What would have happened to this specific individual under a different treatment?"), which goes beyond what do-calculus alone can handle. Counterfactuals require the full SCM, including the noise terms UiU_i, not just the graph.

Practical considerations

The formal machinery of do-calculus and SCMs is powerful, but applying it in practice requires careful attention to real-world constraints.

Feasibility of interventions

Many interventions that are well-defined mathematically are impossible or impractical to carry out:

  • You can write do(gender=male)do(\text{gender} = \text{male}), but you can't actually randomize gender.
  • Cost, logistics, and time constraints may rule out large-scale experiments.
  • When direct intervention isn't feasible, researchers turn to natural experiments (where some external event approximates random assignment) or quasi-experimental designs (difference-in-differences, regression discontinuity).

Ethical implications

Even when an intervention is feasible, it may not be ethical:

  • Withholding a potentially beneficial treatment from a control group raises concerns.
  • Interventions on vulnerable populations require extra scrutiny.
  • Institutional review boards (IRBs) enforce standards around informed consent, minimizing harm, and equitable selection of subjects.

These constraints are a major reason why observational causal inference methods exist in the first place.

Limitations of do-calculus

Do-calculus is complete for the class of problems it addresses, but it rests on assumptions that may not hold:

  • Causal Markov condition: Each variable is independent of its non-descendants given its parents. Violations can occur with feedback loops or aggregated variables.
  • Faithfulness: All observed independencies are consequences of the graph structure, not coincidental cancellations. This can fail in finely tuned systems.
  • Correct graph specification: Do-calculus takes the DAG as given. If your graph is wrong (missing edges, wrong directions), your conclusions will be wrong too.

When you're uncertain about these assumptions, complement do-calculus with sensitivity analysis (how much would an unmeasured confounder need to shift results?) and partial identification (deriving bounds on the causal effect rather than a point estimate).