Difference-in-differences (DiD) is a quasi-experimental research design used to estimate the causal effect of a treatment or intervention. It works by comparing how outcomes change over time between a group that received the treatment and a group that didn't. By looking at changes rather than levels, DiD controls for both time-invariant differences between groups and group-invariant changes over time.

The core logic: if both groups were on similar trajectories before the intervention, then any divergence afterward can be attributed to the treatment. This is what makes the parallel trends assumption so central to the whole approach.

Setup for DiD

DiD requires data on both treatment and control groups, observed both before and after the intervention. This can come from either panel data (same units tracked over time) or repeated cross-sections (different samples from the same populations at different times).

The treatment group is exposed to the intervention at a specific point in time
The control group remains unexposed throughout the study period
The outcome variable is measured for both groups at baseline (pre-intervention) and follow-up (post-intervention)

Parallel Trends Assumption

This is the assumption that makes or breaks a DiD design. It states that, in the absence of treatment, the average change in the outcome would have been the same for both groups. In other words, the control group's trajectory serves as the counterfactual for what would have happened to the treatment group had they not been treated.

You can never directly test this assumption (because you can't observe the counterfactual), but you can assess its plausibility by checking whether the two groups followed similar trends in the pre-intervention period. If they were already diverging before the treatment, your DiD estimate is likely biased.

DiD Estimation

The DiD estimator isolates the treatment effect by taking a "difference of differences": the before-after change in the treatment group, minus the before-after change in the control group. This double-differencing removes both fixed group differences and common time shocks.

Simple Two-Period DiD

In the simplest case, you have two groups and two time periods. The estimator is:

$\hat{\tau}_{DiD} = (\bar{Y}_{Treatment, Post} - \bar{Y}_{Treatment, Pre}) - (\bar{Y}_{Control, Post} - \bar{Y}_{Control, Pre})$

where $\bar{Y}$ is the average outcome for each group-period cell.

This same estimator falls out of a regression framework. You estimate:

$Y_{it} = \beta_0 + \beta_1 \cdot Treat_i + \beta_2 \cdot Post_t + \beta_3 \cdot (Treat_i \times Post_t) + \varepsilon_{it}$

$\beta_1$ captures baseline differences between groups
$\beta_2$ captures the common time trend
$\beta_3$ is the DiD estimate of the treatment effect

Multi-Period DiD

DiD extends naturally to settings with multiple pre- and post-intervention periods. With more time periods, you can:

Examine whether the treatment effect grows, shrinks, or stays constant over time
Provide stronger evidence on pre-trends (more pre-periods to check)
Estimate more flexible time trend specifications

The basic logic is the same: compare the trajectory of the treatment group to that of the control group, now averaged across multiple periods.

Two-Way Fixed Effects Model

The standard regression implementation of DiD with panel data is the two-way fixed effects (TWFE) model:

$Y_{it} = \alpha_i + \gamma_t + \beta \cdot D_{it} + \varepsilon_{it}$

$\alpha_i$ = unit fixed effects (absorb all time-invariant differences across units)
$\gamma_t$ = time fixed effects (absorb all common shocks in each period)
$D_{it}$ = treatment indicator (equals 1 for treated units in post-treatment periods)
$\beta$ = the estimated treatment effect

TWFE can handle multiple treatment and control groups, as well as staggered adoption designs. However, recent econometrics research has shown that TWFE can produce misleading estimates under staggered adoption with heterogeneous treatment effects. This is covered further in the extensions section below.

Key Assumptions

The validity of DiD depends on several assumptions beyond parallel trends. Each one should be carefully evaluated in any application.

Common Trends Assumption

Already discussed above, but worth restating: the treatment and control groups must have been on parallel outcome trajectories prior to the intervention. Researchers typically assess this by:

Plotting the outcome variable for both groups across pre-treatment periods
Estimating an event study specification and checking that pre-treatment coefficients are close to zero and statistically insignificant
Running formal statistical tests for differential pre-trends

Passing these checks doesn't prove parallel trends hold, but failing them is a strong warning sign.

Stable Unit Treatment Value Assumption (SUTVA)

SUTVA requires that one unit's treatment status doesn't affect another unit's outcomes. There should be no spillover effects or interference between units.

Violations are common in practice. For example, if a minimum wage increase in one state causes firms to relocate to a neighboring control state, the control group's outcomes are contaminated. Possible responses include redefining the unit of analysis to a broader geographic level or using methods that explicitly model spillovers.

No Anticipation Effects

Units should not change their behavior in anticipation of future treatment. If treated firms start adjusting their hiring before a policy takes effect, the pre-treatment data is contaminated, and the DiD estimate will be biased.

You can check for anticipation effects by looking at whether outcomes start shifting in the treatment group during the periods just before the intervention. Event study designs are particularly useful for this.

Interpreting DiD Estimates

Average Treatment Effect on the Treated (ATT)

Under the parallel trends assumption, DiD identifies the average treatment effect on the treated (ATT): the average causal effect for units that actually received the treatment. This is distinct from the average treatment effect (ATE), which is the average effect across the entire population.

If treatment effects are homogeneous (the same for everyone), ATT = ATE
If treatment effects are heterogeneous, ATT and ATE can differ. DiD tells you about the treated group specifically, not necessarily what would happen if you treated everyone

Heterogeneous Treatment Effects

Treatment effects often vary across subgroups. You can explore this by interacting the treatment indicator with subgroup indicators (e.g., age, income, region). This reveals which populations benefit most from the intervention and provides a richer picture of the policy's distributional impacts.

Robustness Checks

DiD estimates should always be stress-tested. Here are the most common approaches.

Setup for DiD, Frontiers | A Pragmatic Approach to Guide Implementation Evaluation Research: Strategy Mapping ...

Testing for Pre-Trends

This is the single most important robustness check. Estimate a model that includes leads of the treatment indicator (interactions between the treatment group dummy and pre-intervention time period dummies).

If these lead coefficients are close to zero and insignificant, that's reassuring
If they're significant, it suggests the groups were already diverging before the intervention, casting doubt on the parallel trends assumption

An event study design plots these coefficients period by period, giving you a visual diagnostic of both pre-trends and the dynamic treatment effect.

Placebo Tests

Placebo tests check whether your design picks up effects where none should exist:

Placebo outcomes: Estimate DiD on an outcome that shouldn't be affected by the treatment. A significant effect suggests confounding.
Placebo treatments: Randomly reassign the treatment to different units or time periods. The DiD estimate should be approximately zero.
Placebo timing: Move the treatment date to a pre-intervention period and re-estimate. A significant effect suggests pre-existing trends.

Triple Differences (DDD)

Triple differences adds a third layer of differencing by comparing DiD estimates across two subgroups that are differentially affected by the treatment. For example, if a policy targets older workers but not younger workers, you can compute DiD separately for each age group and then take the difference.

DDD helps control for time-varying confounders that affect both treatment and control groups equally, as long as those confounders don't differentially impact the two subgroups. The tradeoff is that DDD requires additional assumptions and a more complex data structure.

Extensions of DiD

Staggered Adoption DiD

In many real-world settings, treatment rolls out at different times for different units (e.g., states adopting a policy in different years). The standard TWFE approach has traditionally been used here, but recent research (Goodman-Bacon 2021, Callaway and Sant'Anna 2021, Sun and Abraham 2021) has shown that TWFE can produce biased estimates when treatment effects vary across units or over time.

The problem: TWFE implicitly uses already-treated units as controls for later-treated units, and if treatment effects evolve over time, these comparisons are contaminated. Newer estimators address this by:

Restricting comparisons to never-treated or not-yet-treated units as controls
Estimating group-time specific treatment effects and then aggregating them
Allowing for heterogeneous and dynamic treatment effects

Synthetic Control Methods vs. DiD

Synthetic control methods (SCM) construct a weighted combination of control units that best matches the treated unit's pre-intervention outcomes. The treatment effect is the gap between the treated unit and its synthetic counterpart after the intervention.

DiD works well with many treated and control units and assumes parallel trends. SCM is designed for settings with a single (or few) treated unit(s) and many potential controls, and it relaxes the parallel trends assumption by matching on pre-treatment outcomes directly.

SCM can be seen as a more flexible version of DiD, but it has its own limitations: it requires a good pre-treatment fit, and inference can be challenging.

Dynamic Treatment Effects

Treatment effects don't have to be constant over time. A dynamic DiD model interacts the treatment indicator with dummies for each post-treatment period, producing a separate treatment effect estimate for each period after the intervention.

This is closely related to the event study framework. The resulting plot shows:

Pre-treatment coefficients (should be near zero if parallel trends hold)
The treatment effect at each post-treatment period (showing whether effects grow, fade, or remain stable)

Limitations of DiD

Violations of Parallel Trends

If the treatment and control groups were on different trajectories before the intervention, DiD will attribute those pre-existing differences to the treatment. Common sources of violation include time-varying confounders that differentially affect the groups, or compositional changes in the groups over time.

Time-Varying Confounders

DiD eliminates time-invariant confounders by design, but it cannot handle confounders that change over time and affect the two groups differently. For instance, if a recession hits the treatment region harder than the control region at the same time as the policy change, DiD will conflate the recession's effect with the policy's effect.

Spillover Effects

When treatment affects untreated units (through economic linkages, social networks, or general equilibrium effects), SUTVA is violated and DiD estimates are biased. The direction of bias depends on whether spillovers push control group outcomes in the same or opposite direction as the treatment effect.

Applications of DiD

Policy Evaluation Examples

DiD is one of the most widely used methods in applied economics and policy evaluation. Classic examples include:

Card and Krueger (1994): Estimated the effect of New Jersey's minimum wage increase on fast-food employment, using Pennsylvania as the control group. They found no significant negative employment effect, challenging the standard competitive model prediction.
Finkelstein et al. (2012): Used the Oregon Health Insurance Experiment to study the effects of Medicaid expansion on health care utilization, financial strain, and health outcomes.
Chay and Greenstone (2003): Evaluated the effects of the Clean Air Act Amendments on air quality and infant health.

DiD with Panel Data

Panel data (repeated observations on the same units) is the natural setting for DiD. It allows you to include unit fixed effects, examine pre-trends, and estimate dynamic treatment effects. One concern specific to panel data is attrition bias: if units drop out of the sample over time, and attrition is correlated with treatment, your estimates can be biased.

DiD with Repeated Cross-Sections

When you can't track the same units over time, you can still apply DiD using repeated cross-sections (different samples from the same populations at different times). This approach requires an additional assumption: the composition of the treatment and control groups must remain stable over time, or any compositional changes must be unrelated to the treatment.

This is common in settings where researchers use survey data to evaluate population-level policy impacts, such as the effect of state-level policy changes on health or labor market outcomes.