📊Causal Inference Unit 5 – Matching and propensity scores
Matching and propensity scores are powerful tools in causal inference, helping researchers estimate treatment effects by creating balanced groups. These methods pair treated and control units with similar characteristics, mimicking randomized experiments and reducing bias from confounding variables.
Propensity scores summarize multiple covariates into a single value, representing the probability of receiving treatment. Researchers use various matching techniques, assess balance between groups, and estimate causal effects. While powerful, these methods have limitations and rely on key assumptions about unobserved confounding.
Matching is a non-parametric method for estimating causal effects by pairing treated and control units with similar observed characteristics
Treatment group consists of units that receive the intervention or exposure of interest
Control group consists of units that do not receive the intervention or exposure and serve as a comparison
Confounding variables are factors that influence both treatment assignment and the outcome, potentially biasing causal estimates if not accounted for
Propensity score is the probability of receiving treatment given observed covariates, used to balance treatment and control groups
Common support refers to the overlap in propensity scores between treatment and control groups, ensuring comparability
Covariate balance assesses the similarity of observed characteristics between matched treatment and control units
Average treatment effect (ATE) is the expected difference in outcomes between treatment and control groups across the entire population
Theoretical Foundations
Matching methods rely on the assumption of unconfoundedness or ignorability, which states that treatment assignment is independent of potential outcomes given observed covariates
This assumption implies that there are no unobserved confounders influencing both treatment and outcome
Stable unit treatment value assumption (SUTVA) requires that the potential outcomes for each unit are unaffected by the treatment assignment of other units and that there is only one version of the treatment
Matching aims to mimic a randomized experiment by creating balanced treatment and control groups based on observed characteristics
Matching can be viewed as a form of preprocessing data before estimating causal effects, reducing model dependence and increasing robustness
Matching methods are particularly useful when there is limited overlap in covariate distributions between treatment and control groups
The choice of matching variables should be guided by substantive knowledge and the assumed causal structure of the problem
Types of Matching Methods
Exact matching pairs treatment and control units with identical values for all covariates, ensuring perfect balance but potentially discarding many units
Nearest neighbor matching selects the control unit with the smallest distance (e.g., Mahalanobis distance or propensity score) to each treated unit
Variants include 1:1 matching, k:1 matching, and matching with replacement or without replacement
Caliper matching imposes a maximum distance or tolerance for matching, preventing poor matches and improving balance
Stratification or subclassification divides the propensity score into strata and estimates causal effects within each stratum
Kernel matching uses a weighted average of all control units to construct the counterfactual outcome for each treated unit, with weights based on the distance between propensity scores
Coarsened exact matching (CEM) temporarily coarsens covariates into discrete categories, performs exact matching on the coarsened data, and then retains the original values for analysis
Propensity Score Basics
Propensity score is a balancing score that summarizes the information from multiple covariates into a single scalar value
Propensity scores are typically estimated using logistic regression, with treatment as the dependent variable and covariates as predictors
The estimated propensity score is the predicted probability of receiving treatment given the observed covariates
Propensity score matching pairs treated and control units with similar propensity scores, creating balanced groups
Propensity score stratification divides the propensity score into subclasses and estimates causal effects within each subclass
Inverse probability of treatment weighting (IPTW) uses the propensity score to weight observations and create a pseudo-population where treatment assignment is independent of covariates
Propensity score methods assume that the propensity score model is correctly specified and includes all relevant confounders
Implementing Matching Techniques
Select the relevant covariates to include in the matching procedure based on substantive knowledge and the assumed causal structure
Estimate the propensity score using logistic regression or other suitable methods
Choose the appropriate matching method (e.g., nearest neighbor, caliper, stratification) based on the data and research question
Specify the matching parameters, such as the distance metric, caliper width, or number of strata
Perform the matching procedure and assess the resulting covariate balance between matched treatment and control groups
Estimate the causal effect on the matched sample using appropriate methods (e.g., difference in means, regression adjustment)
Conduct sensitivity analyses to assess the robustness of the results to unobserved confounding or alternative matching specifications
Assessing Balance and Diagnostics
Standardized mean differences (SMD) compare the means of each covariate between treatment and control groups, with values close to zero indicating good balance
SMD is calculated as the difference in means divided by the pooled standard deviation
Visual diagnostics, such as propensity score distributions, histograms, or jitter plots, can help assess the overlap and balance of propensity scores
Kolmogorov-Smirnov test or other statistical tests can be used to assess the equality of covariate distributions between matched groups
Variance ratios compare the variances of each covariate between treatment and control groups, with values close to one indicating good balance
Absolute standardized mean differences (ASMD) provide a standardized measure of covariate balance, with values below 0.1 or 0.2 often considered acceptable
Diagnostic plots, such as love plots or cobweb plots, can summarize the balance of multiple covariates simultaneously
Assessing balance helps determine the success of the matching procedure and the credibility of the causal estimates
Limitations and Challenges
Matching methods rely on the assumption of unconfoundedness, which is untestable and may not hold in practice if important confounders are unmeasured
The choice of matching variables and the specification of the propensity score model can impact the results and should be carefully considered
Matching can lead to reduced sample size and loss of statistical power, especially when using exact matching or strict caliper widths
The estimated causal effects may be sensitive to the choice of matching method and parameters, requiring sensitivity analyses
Matching methods may not perform well when there is limited overlap in covariate distributions between treatment and control groups
Matching does not account for unobserved confounding, and the results may be biased if important confounders are omitted
The interpretation of causal effects from matching methods is limited to the matched sample and may not generalize to the entire population
Advanced Topics and Extensions
Doubly robust estimation combines propensity score methods with outcome regression to provide consistent estimates if either the propensity score or outcome model is correctly specified
Matching methods can be extended to handle multiple treatments, continuous treatments, or time-varying treatments
Genetic matching uses a genetic algorithm to optimize the balance of covariates between treatment and control groups
Prognostic score matching uses the predicted outcome under the control condition as a balancing score, similar to the propensity score
Matching can be combined with other causal inference methods, such as instrumental variables or regression discontinuity designs, to strengthen causal claims
Sensitivity analysis techniques, such as Rosenbaum bounds or simulation-based approaches, can assess the robustness of the results to unobserved confounding
Machine learning methods, such as decision trees or random forests, can be used to estimate propensity scores or improve covariate balance
Matching methods can be applied to longitudinal or clustered data, accounting for the dependence structure of the observations