A Directed Acyclic Graph (DAG) is a graphical model that represents causal relationships between variables. It consists of nodes (representing variables) and directed edges (arrows representing causal relationships). DAGs give you a visual way to lay out your causal assumptions so you can reason clearly about what causes what in a complex system.

Nodes and Edges

Nodes represent variables or factors in a causal system, such as age, treatment assignment, or health outcome. Directed edges are arrows that indicate the direction of causal influence from one node to another.

An edge from node A to node B (A → B) means A has a direct causal effect on B.
If there's no edge between two nodes, you're assuming there's no direct causal relationship between them.

That second point is easy to overlook but really matters: every missing edge in a DAG is itself a causal claim.

Acyclic Property

The "acyclic" part of DAG means the graph contains no directed cycles. You can't start at a node, follow the arrows, and end up back where you started. In other words, a variable can never directly or indirectly cause itself.

This property is what gives DAGs a clear causal ordering. Without it, you'd run into paradoxes where A causes B causes A, and the whole framework for reasoning about "upstream" and "downstream" effects breaks down.

Directed Paths

A directed path is a sequence of nodes connected by arrows, all pointing in the same direction. If there's a directed path from A to B (A → ... → B), that means A has a causal effect on B, either directly or through intermediary variables.

Tracing directed paths is how you map out the full chain of causal influence. It also helps you spot where confounding might enter the picture, since confounders create non-causal paths between variables you care about.

Markov Properties of DAGs

The Markov properties connect a DAG's structure to probabilistic independence relationships among the variables. These properties are what make DAGs useful for computation and inference, not just for drawing pictures of causal stories.

Factorization According to the DAG

The joint probability distribution of all variables in a DAG can be broken into a product of simpler conditional probabilities. Each variable's distribution depends only on its parents (the nodes with arrows pointing into it):

$P(X_1, X_2, \ldots, X_n) = \prod_{i=1}^{n} P(X_i \mid Pa(X_i))$

Here, $Pa(X_i)$ denotes the set of parents of node $X_i$ .

This is a big deal practically. Instead of specifying one enormous joint distribution over all your variables, you only need to specify each variable's relationship with its direct causes. That's far more manageable.

Conditional Independence

DAGs encode conditional independence relationships directly in their structure. Two variables $X$ and $Y$ are conditionally independent given a set $Z$ if:

$P(X, Y \mid Z) = P(X \mid Z) \, P(Y \mid Z)$

Two key versions of this show up in DAG theory:

Local Markov property: Each node is independent of its non-descendants, given its parents.
Global Markov property: Any two sets of nodes that are d-separated (see below) by a third set are conditionally independent.

These properties let you read independence relationships straight off the graph, which is essential for identifying confounders and mediators.

D-Separation Criterion

D-separation (directed separation) is the graphical rule for determining whether two sets of nodes are conditionally independent given a third set. This is one of the most important tools in the entire DAG framework.

Two sets of nodes $X$ and $Y$ are d-separated by a conditioning set $Z$ if every path between them is blocked. A path is blocked when it contains:

A non-collider (a node where arrows don't both point in, such as a chain A → M → B or a fork A ← C → B) that is in $Z$ , or
A collider (a node with two arrows pointing into it, like A → C ← B) where C is not in $Z$ and no descendant of C is in $Z$ .

If $X$ and $Y$ are d-separated by $Z$ , then $X \perp\!\!\!\perp Y \mid Z$ in any distribution compatible with the DAG.

The collider rule trips people up the most: conditioning on a collider (or its descendant) opens a path that was previously blocked. This is the mechanism behind collider bias, sometimes called "explaining away."

Causal Interpretation of DAGs

A DAG on its own is just a mathematical object encoding independence relationships. To use it for causal inference, you need to interpret the directed edges as genuine causal influences. That interpretation rests on three key assumptions.

Nodes and edges, Directed acyclic graph - Wikipedia

Causal Markov Assumption

This assumption states that each variable is independent of all its non-descendants, given its direct causes (parents) in the DAG. In plain terms, once you know a variable's direct causes, knowing about other "upstream" or unrelated variables tells you nothing extra.

This is what lets the DAG's factorization correspond to the actual causal structure. If this assumption fails, the graph doesn't faithfully represent how the system generates data.

Causal Sufficiency Assumption

Causal sufficiency requires that all common causes of the observed variables are included in the DAG. There can be no unmeasured confounders lurking outside the graph that simultaneously affect two or more variables you've drawn.

This is a strong assumption and often the hardest to defend in practice. If it's violated, you can get spurious associations between variables that look causal but aren't, leading to biased effect estimates.

Causal Faithfulness Assumption

Faithfulness says that the only conditional independencies in the data are those implied by the DAG structure. There are no "accidental" independencies caused by causal effects that happen to perfectly cancel each other out.

For example, if A affects Y through two paths with effects of +3 and -3 that exactly cancel, faithfulness would be violated because A and Y would appear independent even though causal paths exist. Faithfulness rules out these knife-edge cancellations, which makes it possible to learn causal structure from observed independence patterns.

DAGs vs. Other Graphical Models

Comparison with Bayesian Networks

Bayesian networks share the same mathematical structure as DAGs: directed edges, acyclicity, and the factorization property. The key difference is in interpretation. Bayesian networks represent probabilistic dependencies and are used for tasks like prediction and probabilistic reasoning. They don't necessarily carry causal meaning.

A DAG used for causal inference adds the causal assumptions described above, which lets you reason about interventions (what happens if you set a variable to a value) rather than just observations (what you'd predict given observed data).

Comparison with Structural Equation Models

Structural equation models (SEMs) pair a set of equations describing functional relationships between variables with a graphical representation (a path diagram). You can think of an SEM as a parametric version of a DAG: the DAG captures the qualitative causal structure (which variables affect which), while the SEM's equations specify the quantitative relationships (by how much).

DAGs and SEMs are closely related. A DAG often serves as the starting point for building an SEM, and the SEM fills in the numerical details. DAGs are more flexible in that they don't require you to commit to specific functional forms.

Constructing DAGs from Data

In many settings, you don't start with a known causal structure. Causal discovery methods try to learn the DAG from observed data by detecting statistical patterns of dependence and independence.

Constraint-Based Methods

Constraint-based methods use conditional independence tests to infer the graph structure. The most well-known is the PC algorithm. The process works roughly like this:

Start with a fully connected undirected graph (an edge between every pair of nodes).
Test pairs of variables for marginal independence. Remove edges where independence holds.
Test pairs for conditional independence given one variable, then two variables, and so on. Remove edges accordingly.
Orient edges by identifying collider patterns (v-structures) and applying orientation rules to avoid creating new colliders or cycles.

Common independence tests include the chi-square test (for categorical data) and partial correlation tests (for continuous/Gaussian data). The resulting graph represents the causal structure consistent with the observed conditional independencies.

Score-Based Methods

Score-based methods evaluate candidate DAG structures using a scoring function that balances model fit against complexity. The Greedy Equivalence Search (GES) algorithm is a prominent example.

Common scoring functions include the Bayesian Information Criterion (BIC) and the Bayesian Dirichlet equivalent (BDe) score.
The algorithm searches through the space of possible DAGs (or equivalence classes of DAGs) and selects the structure with the best score.
Because the space of possible DAGs grows super-exponentially with the number of variables, these methods use greedy search heuristics rather than exhaustive enumeration.

Hybrid Methods

Hybrid methods combine constraint-based and score-based approaches to get the best of both worlds. A well-known example is the Max-Min Hill-Climbing (MMHC) algorithm:

Use a constraint-based method to identify a restricted set of candidate parents for each node (this narrows the search space).
Apply a score-based search over this restricted space to find the best DAG.

Hybrid methods tend to be more robust and computationally efficient than using either approach alone, especially in higher-dimensional settings.

Applications of DAGs in Causal Inference

Identifying Causal Effects

DAGs help you determine whether a causal effect can be estimated from observational data and, if so, which variables to adjust for. By examining the paths between treatment and outcome, you can apply identification strategies:

Back-door adjustment: If you can block all back-door paths (non-causal paths) from treatment to outcome by conditioning on a set of variables, you can identify the causal effect. The back-door criterion tells you exactly which sets work.
Front-door adjustment: When back-door adjustment isn't possible (e.g., due to unmeasured confounders), you may still identify the effect by exploiting a mediator that satisfies the front-door criterion.

The DAG makes these strategies explicit and verifiable, rather than relying on intuition about "what to control for."

Dealing with Confounding

A confounder is a variable that causally influences both the treatment and the outcome, creating a spurious association between them. On a DAG, confounding shows up as an open back-door path from treatment to outcome.

DAGs provide a systematic way to identify the minimal sufficient adjustment set: the smallest set of variables you need to condition on to block all confounding paths. This is more reliable than the common but error-prone strategy of "controlling for everything you can measure," which can actually introduce bias if you condition on colliders or mediators.

Mediation Analysis

Mediation analysis investigates how a treatment affects an outcome by examining intermediate variables (mediators). On a DAG, you can trace:

The direct effect: the path from treatment to outcome that doesn't go through the mediator.
The indirect effect: the path from treatment to outcome that passes through the mediator.

The DAG guides you in selecting which variables to condition on for estimating these effects and warns you about situations where mediation analysis may not be valid (e.g., when there's a confounder of the mediator-outcome relationship that is itself affected by treatment).

Limitations and Extensions of DAGs

Cyclic Graphs

Standard DAGs assume no feedback loops, but many real-world systems involve reciprocal causation (e.g., supply and demand, or anxiety and insomnia reinforcing each other). Directed cyclic graphs (DCGs) and equilibrium-based feedback models extend the framework to handle cycles, though they require different assumptions and more complex analysis techniques.

Time-Varying Treatments

Standard DAGs represent variables at a single time point. When treatments change over time and past outcomes influence future treatment decisions, you need models that capture this dynamic structure. Marginal structural models and g-methods extend the DAG framework to handle time-varying treatments and time-varying confounding, using techniques like inverse probability of treatment weighting.

Latent Variables and Selection Bias

DAGs typically assume all relevant variables are observed, but unmeasured (latent) variables are common in practice. Latent variable models and specialized causal discovery algorithms (like the FCI algorithm) extend the framework to account for hidden common causes.

Selection bias arises when the sample you observe isn't representative of the target population, often because some process determined who ended up in your dataset. DAGs can represent this by adding a selection node and modeling the selection mechanism explicitly. Addressing these issues requires additional assumptions (such as missing at random) and specialized methods like inverse probability weighting or multiple imputation.