upgrade
upgrade

๐ŸŽฃStatistical Inference

Effect Size Measures

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical inference isn't just about determining whether an effect existsโ€”it's about understanding how big that effect actually is. While p-values tell you if results are statistically significant, effect size measures tell you if those results are practically meaningful. This distinction is critical: a study with thousands of participants might find a "significant" difference that's so tiny it has no real-world importance. You're being tested on your ability to choose the right effect size measure for different study designs, interpret what the values mean, and explain why effect sizes matter for evidence-based decision-making, meta-analysis, and replication studies.

Effect size measures fall into distinct families based on what they quantify: standardized differences between groups, variance explained, and measures of association for categorical outcomes. Understanding these categories helps you quickly identify which measure fits which research scenario. Don't just memorize formulasโ€”know what type of data each measure handles and when you'd choose one over another.


Standardized Mean Differences

These measures express the difference between group means in standard deviation units, making them ideal for comparing results across studies that use different measurement scales. The core principle: divide the mean difference by a measure of variability to create a unit-free comparison.

Cohen's d

  • Most widely used effect size for two-group comparisonsโ€”calculated as the difference between means divided by the pooled standard deviation: d=Xห‰1โˆ’Xห‰2spooledd = \frac{\bar{X}_1 - \bar{X}_2}{s_{pooled}}
  • Benchmark interpretations of small (0.2), medium (0.5), and large (0.8) come from Cohen's original guidelines, though context matters
  • Best applied when both groups have similar variances and sample sizes; assumes equal standard deviations across groups

Hedges' g

  • Corrects Cohen's d for small-sample biasโ€”applies a correction factor that becomes negligible with larger samples (typically n>20n > 20)
  • Preferred in meta-analyses because it provides unbiased estimates when combining studies with varying sample sizes
  • Interpretation identical to Cohen's dโ€”same benchmarks apply, making results directly comparable

Glass's Delta

  • Uses only the control group's standard deviation for standardization: ฮ”=Xห‰1โˆ’Xห‰2scontrol\Delta = \frac{\bar{X}_1 - \bar{X}_2}{s_{control}}
  • Ideal when treatment affects variabilityโ€”if an intervention changes not just the mean but also the spread of scores, pooling standard deviations would be misleading
  • Common in experimental designs where you want to express treatment effects relative to baseline variability

Compare: Cohen's d vs. Hedges' g vs. Glass's deltaโ€”all measure standardized mean differences, but they differ in which standard deviation they use and whether they correct for bias. If an FRQ asks you to justify your choice of effect size, explain whether groups have equal variances and whether sample sizes are small.

Standardized Mean Difference (SMD)

  • Umbrella term encompassing Cohen's d, Hedges' g, and Glass's deltaโ€”refers to any effect size that standardizes group differences
  • Essential for meta-analysis because it allows combining results from studies using different measurement instruments
  • Watch the terminologyโ€”some software outputs "SMD" generically, so always check which specific formula was applied

Variance-Explained Measures

These measures tell you what proportion of the outcome's variability can be attributed to your predictor(s). The underlying logic: partition total variance into explained and unexplained components.

R-squared (Rยฒ)

  • Proportion of variance explained by the regression modelโ€”ranges from 0 to 1, where R2=1โˆ’SSresidualSStotalR^2 = 1 - \frac{SS_{residual}}{SS_{total}}
  • Interpretation is context-dependentโ€”an R2R^2 of 0.30 might be excellent in psychology but weak in physics; always consider the field's standards
  • Limitation: always increases when you add predictors, even useless onesโ€”this is why adjusted R2R^2 exists for multiple regression

Eta-squared (ฮทยฒ)

  • ANOVA's version of R2R^2โ€”represents the proportion of total variance in the dependent variable explained by the factor: ฮท2=SSbetweenSStotal\eta^2 = \frac{SS_{between}}{SS_{total}}
  • Tends to overestimate population effect sizes, especially with small samples or multiple factors
  • Quick benchmarks: small (0.01), medium (0.06), large (0.14)โ€”but these are rough guidelines, not rigid cutoffs

Partial Eta-squared

  • Controls for other factors in the modelโ€”shows the unique variance explained by one factor after removing variance explained by others
  • Standard output in factorial ANOVAโ€”most statistical software reports partial ฮท2\eta^2 by default rather than eta-squared
  • Not directly comparable to eta-squared because denominators differ; partial values are typically larger for the same effect

Compare: R2R^2 vs. ฮท2\eta^2โ€”both measure variance explained, but R2R^2 is used in regression (continuous predictors) while ฮท2\eta^2 is used in ANOVA (categorical predictors). On exams, match the measure to the analysis type.


Correlation-Based Measures

Correlation coefficients quantify the strength and direction of relationships between variables. The key insight: these measures are already standardized, making them natural effect sizes.

Pearson's Correlation Coefficient (r)

  • Measures linear association between two continuous variablesโ€”ranges from โˆ’1-1 (perfect negative) to +1+1 (perfect positive), with 0 indicating no linear relationship
  • Effect size benchmarks: small (0.10), medium (0.30), large (0.50)โ€”note these differ from Cohen's d benchmarks
  • Squaring gives variance explainedโ€”r2r^2 tells you the proportion of variance shared between variables, directly connecting correlation to regression

Compare: Pearson's r vs. R2R^2โ€”Pearson's r captures direction and strength of a bivariate relationship, while R2R^2 in multiple regression captures total variance explained by all predictors combined. In simple linear regression with one predictor, R2=r2R^2 = r^2.


Measures for Categorical Outcomes

When your outcome is binary (yes/no, disease/no disease), you need effect sizes designed for proportions and odds. These measures compare event rates or odds between groups rather than means.

Odds Ratio

  • Compares odds between groupsโ€”calculated as OR=oddsgroup1oddsgroup2OR = \frac{odds_{group1}}{odds_{group2}}, where odds = probability of event / probability of no event
  • Interpretation anchored at 1.0โ€”values above 1 indicate higher odds in the numerator group; values below 1 indicate lower odds
  • Standard in logistic regression and case-control studiesโ€”when you can't calculate risk directly (retrospective designs), odds ratios are your go-to measure

Risk Ratio (Relative Risk)

  • Compares probabilities directlyโ€”calculated as RR=P(eventโˆฃexposed)P(eventโˆฃunexposed)RR = \frac{P(event|exposed)}{P(event|unexposed)}
  • More intuitive than odds ratios for most audiencesโ€”"twice the risk" is easier to grasp than "twice the odds"
  • Requires prospective dataโ€”only valid in cohort studies or RCTs where you can calculate actual incidence rates

Compare: Odds ratio vs. risk ratioโ€”both measure association in categorical data, but odds ratios work in any design while risk ratios require prospective data. When the outcome is rare (< 10%), odds ratios approximate risk ratios closely. FRQs often ask when each is appropriate.


Quick Reference Table

ConceptBest Examples
Standardized group differencesCohen's d, Hedges' g, Glass's delta
Small-sample correctionHedges' g
Variance explained (regression)R2R^2
Variance explained (ANOVA)Eta-squared, Partial eta-squared
Correlation strengthPearson's r
Categorical outcomes (any design)Odds ratio
Categorical outcomes (prospective)Risk ratio
Meta-analysis applicationsHedges' g, SMD, odds ratio

Self-Check Questions

  1. You're comparing treatment effects across three studies that used different depression scales. Which effect size measure would allow valid comparisons, and why?

  2. A researcher reports ฮท2=0.08\eta^2 = 0.08 from a one-way ANOVA. How would you interpret this value, and what benchmark category does it fall into?

  3. Compare and contrast the odds ratio and risk ratio: In what study designs is each appropriate, and when do their values converge?

  4. Why might a researcher choose Glass's delta over Cohen's d when evaluating an educational intervention? What assumption about the data motivates this choice?

  5. An FRQ presents a multiple regression with R2=0.45R^2 = 0.45 and asks whether adding another predictor improved the model. Why is R2R^2 alone insufficient to answer this question, and what would you need to know?