Structural causal models (SCMs)
Structural causal models (SCMs) combine graphs and equations to represent how variables influence each other in a system. They let you move beyond correlation to estimate causal effects, evaluate interventions, and reason about counterfactuals. An SCM encodes your assumptions about causal structure, then gives you the mathematical machinery to work with those assumptions rigorously.
Definition of SCMs
An SCM is a mathematical model describing the causal relationships between variables in a system. Formally, it consists of three pieces:
- A set of variables (both observed and unobserved)
- A directed acyclic graph (DAG) representing the causal structure, where edges indicate direct causal influences
- A set of structural equations specifying the functional relationship between each variable and its direct causes
Together, these components also imply a joint probability distribution over the variables. The DAG tells you the qualitative story (what causes what), and the structural equations tell you the quantitative story (how much, and in what way).
Endogenous vs. exogenous variables
SCMs distinguish between two types of variables:
- Endogenous variables are determined by other variables within the model. They have at least one incoming edge in the DAG.
- Exogenous variables are determined by factors outside the model. No other variable in the SCM causes them, so they appear as root nodes (no incoming edges) in the DAG.
Exogenous variables are typically assumed to be independently distributed. They represent the "inputs" to the system that the model takes as given rather than explaining.
Structural equations
Structural equations specify how each endogenous variable is generated from its direct causes plus an error term. The error term captures unobserved factors or randomness.
For example, if is caused by :
Here is some function, is the direct cause, and is the error term. A critical point: structural equations are not the same as regression equations. A regression equation describes a statistical association, while a structural equation claims a causal mechanism. Changing in a structural equation changes ; changing in a regression equation doesn't necessarily mean anything causal.
Directed acyclic graphs (DAGs)
DAGs are the graphical backbone of an SCM:
- Nodes represent variables
- Directed edges (arrows) represent direct causal influences
- The absence of an edge between two nodes means there is no direct causal relationship between them
The "acyclic" part means there are no directed cycles. You can't follow the arrows from a variable and loop back to it. This rules out feedback loops (though extensions exist for cyclic systems, covered later).
Causal Markov condition
The causal Markov condition is a core assumption linking the DAG to the probability distribution. It states:
A variable is independent of all its non-descendants, given its parents (direct causes) in the DAG.
This assumption allows you to factorize the joint distribution according to the DAG structure. Each variable's conditional distribution depends only on its parents.
For example, in the chain , the causal Markov condition tells you that and are independent given . Once you know , learning gives you no additional information about .
Causal sufficiency
Causal sufficiency assumes that all common causes of the observed variables are included in the model. In other words, there are no unmeasured confounders affecting multiple observed variables.
This is a strong assumption. In practice, it often doesn't hold perfectly, and violations can lead to biased causal effect estimates. When you suspect unmeasured confounding, you'll need techniques like sensitivity analysis or instrumental variables (discussed below).
Causal faithfulness
Causal faithfulness assumes that the only independencies in the data are those implied by the DAG. There are no "accidental" independencies caused by parameters perfectly canceling each other out.
For example, if two causal paths from to have effects that exactly cancel (one positive, one negative, same magnitude), and would appear independent even though causal paths exist. Faithfulness rules this out. Such exact cancellations are considered rare in practice, but they can occur.
Representing interventions with SCMs
Interventions are actions that force variables to take specific values, overriding their natural causes. SCMs give you a precise way to represent and reason about interventions, which is what separates causal reasoning from purely statistical reasoning.
Interventional distributions
An interventional distribution is the probability distribution of variables after an intervention. It's written using the -operator:
This reads as "the probability of when we set to value ." This is fundamentally different from the conditional distribution , which is what you'd observe by filtering data. The -operator represents actively manipulating , not passively observing it.
Graph mutilation
To derive an interventional distribution, you modify the DAG through graph mutilation:
- Identify the variable being intervened on
- Remove all incoming edges to (since the intervention overrides 's natural causes)
- Set to the intervention value
- The resulting "mutilated graph" represents the causal structure under the intervention
The rest of the DAG stays the same. Variables downstream of are still affected by , but itself is no longer influenced by its former parents.
Truncated factorization
Truncated factorization is the mathematical counterpart of graph mutilation. Under the original DAG, the joint distribution factorizes as:
where are the parents of . When you intervene on , you drop the factor for from the product (since is now fixed, not generated by its parents) and substitute everywhere else. The result is the interventional distribution over the remaining variables.
Identification of causal effects
Identification asks: can you compute a causal effect from observational data alone, given your assumed DAG? If yes, the effect is "identified." If not, you need additional data or assumptions. SCMs provide graphical criteria to answer this question.
Back-door criterion
The back-door criterion is the most commonly used identification tool. A set of variables satisfies the back-door criterion relative to if:
- blocks all back-door paths from to (paths that enter through an incoming arrow)
- No variable in is a descendant of
If such a exists, you can identify the causal effect by adjusting for :
For example, in the DAG , the confounder creates a back-door path. Conditioning on blocks it, satisfying the criterion.
Front-door criterion
The front-door criterion applies when no set of variables satisfies the back-door criterion (e.g., because the confounder is unmeasured). A set satisfies the front-door criterion relative to if:
- intercepts all directed paths from to
- There are no unblocked back-door paths from to
- All back-door paths from to are blocked by
A classic example: with an unmeasured confounder affecting both and . You can't adjust for directly, but (the mediator) satisfies the front-door criterion, letting you identify the causal effect through a two-step adjustment.
Instrumental variables
Instrumental variables (IVs) provide another identification strategy when unmeasured confounders are present. A variable is a valid instrument for the effect of on if:
- is associated with (relevance)
- does not directly affect except through (exclusion restriction)
- is independent of all confounders of the - relationship (independence)
A well-known example: estimating the effect of education on income. A person's quarter of birth affects years of education (through compulsory schooling laws) but has no direct effect on income, making it a candidate instrument.
Mediation analysis
Mediation analysis decomposes the total causal effect of on into:
- Direct effect: the effect of on not passing through a mediator
- Indirect effect: the effect of on that operates through
For example, a drug () might lower blood pressure () both directly and indirectly by reducing heart rate (). SCMs let you define and estimate these effects precisely, though mediation analysis requires assumptions about the absence of unmeasured confounders for both the and relationships.
Counterfactuals in SCMs
Counterfactuals ask "what would have happened if things had been different?" SCMs handle counterfactuals by using the structural equations to simulate alternative scenarios while holding the exogenous variables (the background context) fixed.
Potential outcomes framework
The potential outcomes framework (also called the Rubin Causal Model) defines causal effects in terms of hypothetical outcomes. For a binary treatment:
- : the outcome if the unit receives treatment
- : the outcome if the unit does not receive treatment
The individual causal effect is . The fundamental problem is that you only observe one of these for each unit. SCMs and potential outcomes are complementary frameworks: SCMs provide the structural machinery, while potential outcomes provide a clean notation for defining effects.
Counterfactual queries
Counterfactual queries ask about outcomes under hypothetical interventions. In an SCM, you answer them by:
- Abduction: Use the observed data to infer the values of the exogenous variables (error terms)
- Action: Modify the structural equations to reflect the hypothetical intervention
- Prediction: Use the modified equations with the inferred exogenous values to compute the counterfactual outcome
For example, "What would this patient's blood pressure have been without the drug?" You first use the patient's actual data to pin down their individual error terms, then re-run the model with (no drug) to predict the counterfactual blood pressure.
Note that this is different from the interventional query , which asks about a population-level effect. The counterfactual is specific to a particular individual with known characteristics.
Twin networks
Twin networks are a graphical tool for computing counterfactual quantities. They work by creating two copies of the SCM:
- The factual network represents what actually happened
- The counterfactual network represents the hypothetical scenario
The two networks share the same exogenous variables, which ties the individual's background characteristics together across both worlds. You can then read off counterfactual quantities by comparing outcomes between the two networks.
For example, in a twin network for a drug study, the factual side shows the patient taking the drug and the observed outcome, while the counterfactual side shows the same patient not taking the drug, with the same exogenous factors.
Learning SCMs from data
Learning an SCM from data involves two tasks: discovering the causal structure (the DAG) and estimating the parameters of the structural equations. Both are challenging due to limited data, latent variables, and the fact that multiple DAGs can produce the same observed distribution.
Causal structure learning
Causal structure learning aims to infer the DAG from observational data. Methods fall into three categories:
- Constraint-based methods (e.g., PC algorithm): use conditional independence tests to determine which edges belong in the DAG
- Score-based methods (e.g., GES algorithm): search over possible DAGs to find the one that best fits the data according to a scoring function
- Hybrid methods: combine both approaches
A key limitation: observational data alone can typically only identify the DAG up to its Markov equivalence class (a set of DAGs that encode the same conditional independencies). You may need additional assumptions or experimental data to distinguish between equivalent structures.
Constraint-based methods
Constraint-based methods rely on the causal Markov condition and faithfulness to infer the DAG. The PC algorithm is the most well-known:
- Start with a fully connected undirected graph
- For each pair of adjacent nodes, test whether they are conditionally independent given some subset of other variables
- Remove edges where conditional independence is found
- Orient edges using rules based on v-structures and acyclicity constraints
These methods are computationally efficient but sensitive to errors in independence tests, especially with small samples or violations of faithfulness.
Score-based methods
Score-based methods search for the DAG that optimizes a scoring function balancing data fit and model complexity. Common scores include the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC).
The Greedy Equivalence Search (GES) algorithm:
- Start with an empty graph (no edges)
- Forward phase: iteratively add edges that most improve the score
- Backward phase: iteratively remove edges that improve the score
- Return the highest-scoring DAG
Score-based methods are less sensitive to individual test errors than constraint-based methods, but the search space grows super-exponentially with the number of variables, making them computationally expensive for large systems.
Hybrid methods
Hybrid methods combine constraint-based and score-based approaches to get the best of both worlds. A typical strategy:
- Use constraint-based methods to prune the search space (eliminate edges that are clearly absent)
- Use score-based methods to search over the remaining candidate structures
The Max-Min Hill-Climbing (MMHC) algorithm is a prominent example. It uses the MMPC algorithm to identify candidate parent-child relationships, then applies hill-climbing search to find the best-scoring DAG within that restricted space. This achieves a balance between computational efficiency and robustness.
Parameter estimation
Once you have the DAG, you need to estimate the parameters of the structural equations. The two main approaches:
- Maximum likelihood estimation (MLE): find parameter values that maximize the probability of observing the data, given the DAG structure
- Bayesian methods: specify prior distributions over parameters and update them with the data to get posterior distributions
Both approaches require assumptions about the functional form of the structural equations (e.g., linear vs. nonlinear) and the distribution of error terms (e.g., Gaussian vs. non-Gaussian). Getting these assumptions wrong can lead to poor estimates.
Applications of SCMs
SCMs are used across many fields, including epidemiology, economics, social sciences, and AI. They provide a principled framework for several practical tasks.
Causal effect estimation
SCMs let you estimate causal effects from observational data by leveraging the assumed causal structure. For example, using electronic health records, you could estimate the causal effect of a medication on patient outcomes by identifying the right adjustment set from the DAG and applying the back-door adjustment formula.
Policy evaluation
By simulating interventions on an SCM, you can predict the effects of policies before implementing them. For instance, you could model the causal relationships between taxation, income inequality, and economic growth, then simulate different tax policies to compare their predicted outcomes.
Transportability
Transportability addresses whether causal effects estimated in one population generalize to another. SCMs provide formal tools for assessing this. For example, if a clinical trial estimates a drug's effect in one demographic group, transportability analysis can determine under what conditions that estimate applies to a different population with different demographics and comorbidities.
Causal discovery
SCMs also serve as the target for causal discovery: inferring the causal structure itself from observational data. This is valuable for generating hypotheses. For example, applying causal discovery algorithms to cohort study data might reveal previously unknown causal factors for a disease, guiding future experimental research.
Limitations and extensions of SCMs
SCMs are powerful but come with assumptions that don't always hold. Understanding these limitations helps you apply SCMs responsibly.
Latent confounding
Standard SCMs assume causal sufficiency (all common causes are measured). In practice, unmeasured confounders are common and can bias causal effect estimates. For example, in studying smoking and lung cancer, unmeasured genetic factors might influence both.
Extensions to handle this include:
- Latent variable models that explicitly represent unmeasured confounders
- Sensitivity analysis that quantifies how robust your conclusions are to potential unmeasured confounding
- Bounds on causal effects when point identification isn't possible
Cyclic causal models
Standard SCMs require acyclicity, but many real-world systems involve feedback loops. Job satisfaction affects job performance, which in turn affects satisfaction. Cyclic causal models extend the SCM framework to handle such cases, though they require different assumptions and estimation techniques (e.g., equilibrium conditions or dynamic models).
Time-varying treatments
Standard SCMs typically model treatments as fixed at a single point in time. Many real-world treatments change over time (e.g., medication dosages adjusted based on patient response). Extensions like marginal structural models and structural nested models handle time-varying treatments by modeling the causal effects of treatment sequences rather than single treatment assignments. These methods use techniques like inverse probability weighting to account for time-varying confounding.