Fiveable

📊Causal Inference Unit 9 Review

QR code for Causal Inference practice questions

9.3 Structural causal models (SCMs)

9.3 Structural causal models (SCMs)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Causal Inference
Unit & Topic Study Guides

Structural causal models (SCMs)

Structural causal models (SCMs) combine graphs and equations to represent how variables influence each other in a system. They let you move beyond correlation to estimate causal effects, evaluate interventions, and reason about counterfactuals. An SCM encodes your assumptions about causal structure, then gives you the mathematical machinery to work with those assumptions rigorously.

Definition of SCMs

An SCM is a mathematical model describing the causal relationships between variables in a system. Formally, it consists of three pieces:

  • A set of variables (both observed and unobserved)
  • A directed acyclic graph (DAG) representing the causal structure, where edges indicate direct causal influences
  • A set of structural equations specifying the functional relationship between each variable and its direct causes

Together, these components also imply a joint probability distribution over the variables. The DAG tells you the qualitative story (what causes what), and the structural equations tell you the quantitative story (how much, and in what way).

Endogenous vs. exogenous variables

SCMs distinguish between two types of variables:

  • Endogenous variables are determined by other variables within the model. They have at least one incoming edge in the DAG.
  • Exogenous variables are determined by factors outside the model. No other variable in the SCM causes them, so they appear as root nodes (no incoming edges) in the DAG.

Exogenous variables are typically assumed to be independently distributed. They represent the "inputs" to the system that the model takes as given rather than explaining.

Structural equations

Structural equations specify how each endogenous variable is generated from its direct causes plus an error term. The error term captures unobserved factors or randomness.

For example, if YY is caused by XX:

Y=f(X,ϵY)Y = f(X, \epsilon_Y)

Here ff is some function, XX is the direct cause, and ϵY\epsilon_Y is the error term. A critical point: structural equations are not the same as regression equations. A regression equation describes a statistical association, while a structural equation claims a causal mechanism. Changing XX in a structural equation changes YY; changing XX in a regression equation doesn't necessarily mean anything causal.

Directed acyclic graphs (DAGs)

DAGs are the graphical backbone of an SCM:

  • Nodes represent variables
  • Directed edges (arrows) represent direct causal influences
  • The absence of an edge between two nodes means there is no direct causal relationship between them

The "acyclic" part means there are no directed cycles. You can't follow the arrows from a variable and loop back to it. This rules out feedback loops (though extensions exist for cyclic systems, covered later).

Causal Markov condition

The causal Markov condition is a core assumption linking the DAG to the probability distribution. It states:

A variable is independent of all its non-descendants, given its parents (direct causes) in the DAG.

This assumption allows you to factorize the joint distribution according to the DAG structure. Each variable's conditional distribution depends only on its parents.

For example, in the chain XYZX \rightarrow Y \rightarrow Z, the causal Markov condition tells you that XX and ZZ are independent given YY. Once you know YY, learning XX gives you no additional information about ZZ.

Causal sufficiency

Causal sufficiency assumes that all common causes of the observed variables are included in the model. In other words, there are no unmeasured confounders affecting multiple observed variables.

This is a strong assumption. In practice, it often doesn't hold perfectly, and violations can lead to biased causal effect estimates. When you suspect unmeasured confounding, you'll need techniques like sensitivity analysis or instrumental variables (discussed below).

Causal faithfulness

Causal faithfulness assumes that the only independencies in the data are those implied by the DAG. There are no "accidental" independencies caused by parameters perfectly canceling each other out.

For example, if two causal paths from XX to YY have effects that exactly cancel (one positive, one negative, same magnitude), XX and YY would appear independent even though causal paths exist. Faithfulness rules this out. Such exact cancellations are considered rare in practice, but they can occur.

Representing interventions with SCMs

Interventions are actions that force variables to take specific values, overriding their natural causes. SCMs give you a precise way to represent and reason about interventions, which is what separates causal reasoning from purely statistical reasoning.

Interventional distributions

An interventional distribution is the probability distribution of variables after an intervention. It's written using the dodo-operator:

P(Ydo(X=x))P(Y \mid do(X = x))

This reads as "the probability of YY when we set XX to value xx." This is fundamentally different from the conditional distribution P(YX=x)P(Y \mid X = x), which is what you'd observe by filtering data. The dodo-operator represents actively manipulating XX, not passively observing it.

Graph mutilation

To derive an interventional distribution, you modify the DAG through graph mutilation:

  1. Identify the variable XX being intervened on
  2. Remove all incoming edges to XX (since the intervention overrides XX's natural causes)
  3. Set XX to the intervention value
  4. The resulting "mutilated graph" represents the causal structure under the intervention

The rest of the DAG stays the same. Variables downstream of XX are still affected by XX, but XX itself is no longer influenced by its former parents.

Truncated factorization

Truncated factorization is the mathematical counterpart of graph mutilation. Under the original DAG, the joint distribution factorizes as:

P(V1,V2,,Vn)=i=1nP(ViPAi)P(V_1, V_2, \ldots, V_n) = \prod_{i=1}^{n} P(V_i \mid PA_i)

where PAiPA_i are the parents of ViV_i. When you intervene on X=xX = x, you drop the factor for XX from the product (since XX is now fixed, not generated by its parents) and substitute X=xX = x everywhere else. The result is the interventional distribution over the remaining variables.

Identification of causal effects

Identification asks: can you compute a causal effect from observational data alone, given your assumed DAG? If yes, the effect is "identified." If not, you need additional data or assumptions. SCMs provide graphical criteria to answer this question.

Back-door criterion

The back-door criterion is the most commonly used identification tool. A set of variables ZZ satisfies the back-door criterion relative to (X,Y)(X, Y) if:

  1. ZZ blocks all back-door paths from XX to YY (paths that enter XX through an incoming arrow)
  2. No variable in ZZ is a descendant of XX

If such a ZZ exists, you can identify the causal effect by adjusting for ZZ:

P(Ydo(X=x))=zP(YX=x,Z=z)P(Z=z)P(Y \mid do(X = x)) = \sum_z P(Y \mid X = x, Z = z) \cdot P(Z = z)

For example, in the DAG XZYX \leftarrow Z \rightarrow Y, the confounder ZZ creates a back-door path. Conditioning on ZZ blocks it, satisfying the criterion.

Front-door criterion

The front-door criterion applies when no set of variables satisfies the back-door criterion (e.g., because the confounder is unmeasured). A set ZZ satisfies the front-door criterion relative to (X,Y)(X, Y) if:

  1. ZZ intercepts all directed paths from XX to YY
  2. There are no unblocked back-door paths from XX to ZZ
  3. All back-door paths from ZZ to YY are blocked by XX

A classic example: XZYX \rightarrow Z \rightarrow Y with an unmeasured confounder UU affecting both XX and YY. You can't adjust for UU directly, but ZZ (the mediator) satisfies the front-door criterion, letting you identify the causal effect through a two-step adjustment.

Instrumental variables

Instrumental variables (IVs) provide another identification strategy when unmeasured confounders are present. A variable ZZ is a valid instrument for the effect of XX on YY if:

  1. ZZ is associated with XX (relevance)
  2. ZZ does not directly affect YY except through XX (exclusion restriction)
  3. ZZ is independent of all confounders of the XX-YY relationship (independence)

A well-known example: estimating the effect of education on income. A person's quarter of birth affects years of education (through compulsory schooling laws) but has no direct effect on income, making it a candidate instrument.

Mediation analysis

Mediation analysis decomposes the total causal effect of XX on YY into:

  • Direct effect: the effect of XX on YY not passing through a mediator MM
  • Indirect effect: the effect of XX on YY that operates through MM

For example, a drug (XX) might lower blood pressure (YY) both directly and indirectly by reducing heart rate (MM). SCMs let you define and estimate these effects precisely, though mediation analysis requires assumptions about the absence of unmeasured confounders for both the XYX \rightarrow Y and MYM \rightarrow Y relationships.

Counterfactuals in SCMs

Counterfactuals ask "what would have happened if things had been different?" SCMs handle counterfactuals by using the structural equations to simulate alternative scenarios while holding the exogenous variables (the background context) fixed.

Potential outcomes framework

The potential outcomes framework (also called the Rubin Causal Model) defines causal effects in terms of hypothetical outcomes. For a binary treatment:

  • Y(1)Y(1): the outcome if the unit receives treatment
  • Y(0)Y(0): the outcome if the unit does not receive treatment

The individual causal effect is Y(1)Y(0)Y(1) - Y(0). The fundamental problem is that you only observe one of these for each unit. SCMs and potential outcomes are complementary frameworks: SCMs provide the structural machinery, while potential outcomes provide a clean notation for defining effects.

Counterfactual queries

Counterfactual queries ask about outcomes under hypothetical interventions. In an SCM, you answer them by:

  1. Abduction: Use the observed data to infer the values of the exogenous variables (error terms)
  2. Action: Modify the structural equations to reflect the hypothetical intervention
  3. Prediction: Use the modified equations with the inferred exogenous values to compute the counterfactual outcome

For example, "What would this patient's blood pressure have been without the drug?" You first use the patient's actual data to pin down their individual error terms, then re-run the model with X=0X = 0 (no drug) to predict the counterfactual blood pressure.

Note that this is different from the interventional query P(Ydo(X=0))P(Y \mid do(X = 0)), which asks about a population-level effect. The counterfactual is specific to a particular individual with known characteristics.

Twin networks

Twin networks are a graphical tool for computing counterfactual quantities. They work by creating two copies of the SCM:

  • The factual network represents what actually happened
  • The counterfactual network represents the hypothetical scenario

The two networks share the same exogenous variables, which ties the individual's background characteristics together across both worlds. You can then read off counterfactual quantities by comparing outcomes between the two networks.

For example, in a twin network for a drug study, the factual side shows the patient taking the drug and the observed outcome, while the counterfactual side shows the same patient not taking the drug, with the same exogenous factors.

Learning SCMs from data

Learning an SCM from data involves two tasks: discovering the causal structure (the DAG) and estimating the parameters of the structural equations. Both are challenging due to limited data, latent variables, and the fact that multiple DAGs can produce the same observed distribution.

Causal structure learning

Causal structure learning aims to infer the DAG from observational data. Methods fall into three categories:

  • Constraint-based methods (e.g., PC algorithm): use conditional independence tests to determine which edges belong in the DAG
  • Score-based methods (e.g., GES algorithm): search over possible DAGs to find the one that best fits the data according to a scoring function
  • Hybrid methods: combine both approaches

A key limitation: observational data alone can typically only identify the DAG up to its Markov equivalence class (a set of DAGs that encode the same conditional independencies). You may need additional assumptions or experimental data to distinguish between equivalent structures.

Constraint-based methods

Constraint-based methods rely on the causal Markov condition and faithfulness to infer the DAG. The PC algorithm is the most well-known:

  1. Start with a fully connected undirected graph
  2. For each pair of adjacent nodes, test whether they are conditionally independent given some subset of other variables
  3. Remove edges where conditional independence is found
  4. Orient edges using rules based on v-structures and acyclicity constraints

These methods are computationally efficient but sensitive to errors in independence tests, especially with small samples or violations of faithfulness.

Score-based methods

Score-based methods search for the DAG that optimizes a scoring function balancing data fit and model complexity. Common scores include the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC).

The Greedy Equivalence Search (GES) algorithm:

  1. Start with an empty graph (no edges)
  2. Forward phase: iteratively add edges that most improve the score
  3. Backward phase: iteratively remove edges that improve the score
  4. Return the highest-scoring DAG

Score-based methods are less sensitive to individual test errors than constraint-based methods, but the search space grows super-exponentially with the number of variables, making them computationally expensive for large systems.

Hybrid methods

Hybrid methods combine constraint-based and score-based approaches to get the best of both worlds. A typical strategy:

  1. Use constraint-based methods to prune the search space (eliminate edges that are clearly absent)
  2. Use score-based methods to search over the remaining candidate structures

The Max-Min Hill-Climbing (MMHC) algorithm is a prominent example. It uses the MMPC algorithm to identify candidate parent-child relationships, then applies hill-climbing search to find the best-scoring DAG within that restricted space. This achieves a balance between computational efficiency and robustness.

Parameter estimation

Once you have the DAG, you need to estimate the parameters of the structural equations. The two main approaches:

  • Maximum likelihood estimation (MLE): find parameter values that maximize the probability of observing the data, given the DAG structure
  • Bayesian methods: specify prior distributions over parameters and update them with the data to get posterior distributions

Both approaches require assumptions about the functional form of the structural equations (e.g., linear vs. nonlinear) and the distribution of error terms (e.g., Gaussian vs. non-Gaussian). Getting these assumptions wrong can lead to poor estimates.

Applications of SCMs

SCMs are used across many fields, including epidemiology, economics, social sciences, and AI. They provide a principled framework for several practical tasks.

Causal effect estimation

SCMs let you estimate causal effects from observational data by leveraging the assumed causal structure. For example, using electronic health records, you could estimate the causal effect of a medication on patient outcomes by identifying the right adjustment set from the DAG and applying the back-door adjustment formula.

Policy evaluation

By simulating interventions on an SCM, you can predict the effects of policies before implementing them. For instance, you could model the causal relationships between taxation, income inequality, and economic growth, then simulate different tax policies to compare their predicted outcomes.

Transportability

Transportability addresses whether causal effects estimated in one population generalize to another. SCMs provide formal tools for assessing this. For example, if a clinical trial estimates a drug's effect in one demographic group, transportability analysis can determine under what conditions that estimate applies to a different population with different demographics and comorbidities.

Causal discovery

SCMs also serve as the target for causal discovery: inferring the causal structure itself from observational data. This is valuable for generating hypotheses. For example, applying causal discovery algorithms to cohort study data might reveal previously unknown causal factors for a disease, guiding future experimental research.

Limitations and extensions of SCMs

SCMs are powerful but come with assumptions that don't always hold. Understanding these limitations helps you apply SCMs responsibly.

Latent confounding

Standard SCMs assume causal sufficiency (all common causes are measured). In practice, unmeasured confounders are common and can bias causal effect estimates. For example, in studying smoking and lung cancer, unmeasured genetic factors might influence both.

Extensions to handle this include:

  • Latent variable models that explicitly represent unmeasured confounders
  • Sensitivity analysis that quantifies how robust your conclusions are to potential unmeasured confounding
  • Bounds on causal effects when point identification isn't possible

Cyclic causal models

Standard SCMs require acyclicity, but many real-world systems involve feedback loops. Job satisfaction affects job performance, which in turn affects satisfaction. Cyclic causal models extend the SCM framework to handle such cases, though they require different assumptions and estimation techniques (e.g., equilibrium conditions or dynamic models).

Time-varying treatments

Standard SCMs typically model treatments as fixed at a single point in time. Many real-world treatments change over time (e.g., medication dosages adjusted based on patient response). Extensions like marginal structural models and structural nested models handle time-varying treatments by modeling the causal effects of treatment sequences rather than single treatment assignments. These methods use techniques like inverse probability weighting to account for time-varying confounding.