Fiveable

🥖Linear Modeling Theory Unit 12 Review

QR code for Linear Modeling Theory practice questions

12.1 ANCOVA Model and Assumptions

12.1 ANCOVA Model and Assumptions

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

ANCOVA in linear modeling

ANCOVA (Analysis of Covariance) combines the group-comparison logic of ANOVA with regression on one or more continuous covariates. The result is a model that compares group means on an outcome variable after adjusting for covariates that might otherwise confound the comparison. This matters most when random assignment isn't possible and groups already differ on characteristics that influence the dependent variable.

Purpose and applications of ANCOVA

ANCOVA serves two related goals: it removes bias from group comparisons by statistically controlling for covariates, and it increases statistical power by explaining away within-group variability that would otherwise inflate the error term.

  • When the covariate accounts for a meaningful share of within-group variance, the residual error shrinks, making it easier to detect true group differences.
  • ANCOVA is especially valuable in quasi-experimental designs where intact groups (classrooms, clinics, regions) are compared and pre-existing differences are likely.

Common applications include:

  • Comparing treatment effects while controlling for baseline scores (e.g., adjusting post-test means for pre-test performance)
  • Examining group differences while accounting for demographic confounds such as age or years of education
  • Increasing power in randomized experiments where a strong covariate is available (e.g., using baseline blood pressure in a drug trial)

Advantages of using ANCOVA

  • Adjusted group comparisons. ANCOVA shifts each group's mean to reflect what it would be if all groups had the same covariate value, producing a fairer comparison.
  • Greater statistical power. Removing covariate-related variance from the error term lowers MSEMS_E, which increases the FF-ratio for the group effect.
  • Flexibility. You can study the effect of a categorical independent variable while simultaneously accounting for one or more continuous covariates, all within a single linear model.

Components of ANCOVA

Variables in the ANCOVA model

  • Dependent variable (Y): The continuous outcome you want to compare across groups (e.g., exam score, systolic blood pressure).
  • Independent variable (X): The categorical grouping factor with two or more levels (e.g., treatment vs. control, three different curricula). In the linear model this is represented by dummy or effect codes.
  • Covariate (C): A continuous variable related to YY but not of primary interest. Pre-test scores and age are classic examples. The covariate is included so the model can adjust each group's mean to a common covariate value (typically the grand mean of CC).
Purpose and applications of ANCOVA, Frontiers | A cautionary note on the use of the Analysis of Covariance (ANCOVA) in ...

ANCOVA model equation and parameters

For a single-factor design with one covariate, the model is:

Yij=β0+β1Xij+β2Cij+εijY_{ij} = \beta_0 + \beta_1 X_{ij} + \beta_2 C_{ij} + \varepsilon_{ij}

  • β0\beta_0 is the intercept (predicted YY when XX is at its reference level and CC equals zero, or the grand mean depending on coding).
  • β1\beta_1 is the effect of the grouping variable, representing the adjusted difference between groups after controlling for the covariate.
  • β2\beta_2 is the regression slope for the covariate, capturing how much YY changes per one-unit increase in CC, pooled across groups.
  • εij\varepsilon_{ij} is the residual error, assumed N(0,σ2)\sim N(0, \sigma^2).

With more than two groups, β1X\beta_1 X expands into a set of dummy-coded (or effect-coded) terms, one for each degree of freedom among the groups.

Adjusted means (also called least-squares means or estimated marginal means) are the predicted values of YY for each group when the covariate is held at its grand mean. These are the quantities you actually compare in ANCOVA, not the raw group means.

Assumptions of ANCOVA

ANCOVA inherits the standard linear-model assumptions and adds one that is unique to the covariate-by-group structure.

Independence and normality assumptions

  • Independence of observations. Each observation must be independent of every other observation. Clustering (e.g., students nested in classrooms) violates this assumption and inflates Type I error because standard errors become too small. If clustering is present, multilevel modeling is a better choice.
  • Normality of residuals. The residuals εij\varepsilon_{ij} should be approximately normally distributed. You can check this with a Q-Q plot or a Shapiro-Wilk test on the residuals. With large samples the FF-test is fairly robust to moderate non-normality, but with small samples violations can distort pp-values and confidence intervals.

Homogeneity and linearity assumptions

  • Homogeneity of variance (homoscedasticity). The variance of the residuals should be roughly equal across all groups. Levene's test or a residuals-vs.-fitted plot can diagnose this. Heteroscedasticity biases standard errors, which in turn makes FF-tests and confidence intervals unreliable.
  • Linearity. The relationship between the covariate and the dependent variable must be linear within each group. A scatterplot of YY vs. CC (color-coded by group) is the simplest diagnostic. If the relationship is curved, the model will mis-estimate adjusted means, potentially reversing the direction of group differences.
Purpose and applications of ANCOVA, Frontiers | A cautionary note on the use of the Analysis of Covariance (ANCOVA) in ...

Additional assumptions and considerations

  • Homogeneity of regression slopes. This is the assumption unique to ANCOVA. The slope relating the covariate to the outcome must be the same in every group. In model terms, there should be no X×CX \times C interaction. If the slopes differ across groups, a single pooled slope cannot correctly adjust the means, and the adjusted group differences will depend on where along the covariate you evaluate them. You can test this by fitting a model that includes the interaction term and checking whether it is significant.

If the homogeneity-of-slopes assumption fails, standard ANCOVA is not appropriate. Consider instead a model that includes the interaction (sometimes called the Johnson-Neyman approach) or use separate regression lines for each group.

  • Reliability of the covariate. Measurement error in the covariate attenuates its slope (β2\beta_2 is biased toward zero), which means the adjustment is incomplete. The group effect estimate then absorbs leftover covariate-related variance, leading to biased adjusted means. Using a highly reliable measure for the covariate (or correcting for attenuation) reduces this problem.

ANCOVA appropriateness

Research question and data requirements

Before choosing ANCOVA, confirm three things:

  1. Your research question asks whether group means on a continuous outcome differ after controlling for one or more continuous covariates.
  2. The independent variable is categorical (two or more groups) and the dependent variable is continuous.
  3. You have identified at least one covariate that is theoretically related to the outcome and measured on a continuous scale. The covariate should be measured before the treatment or at least not be affected by it; otherwise the adjustment can remove part of the treatment effect itself.

Checking assumptions and considering alternatives

Work through the assumptions in a logical order:

  1. Independence — consider the study design; no statistical test can fully verify this.
  2. Linearity — plot YY vs. CC within each group.
  3. Homogeneity of regression slopes — fit the interaction model and test the X×CX \times C term.
  4. Normality of residuals — inspect a Q-Q plot of residuals from the ANCOVA model.
  5. Homogeneity of variance — check a residuals-vs.-fitted plot or run Levene's test.

If assumptions are severely violated, alternatives include:

  • Multiple regression with interaction terms when slopes are unequal across groups.
  • Robust or nonparametric methods when normality or homoscedasticity fails badly.
  • Multilevel (mixed) models when observations are clustered.

Sample size and power considerations

Statistical power in ANCOVA depends on several factors:

  • Number of groups and covariate strength. A covariate that correlates strongly with YY removes more error variance, boosting power. The effective error variance is approximately σ2(1rYC2)\sigma^2(1 - r^2_{YC}), where rYCr_{YC} is the within-group correlation between the covariate and the outcome.
  • Sample size per group. Each additional covariate consumes a degree of freedom, so adding weak covariates can actually reduce power. Include only covariates with a meaningful relationship to YY.
  • Effect size. Consider both statistical significance and practical significance when planning sample size. A statistically significant but trivially small adjusted mean difference may not be meaningful.

A formal power analysis (using software such as G*Power or simulation) that accounts for the expected rYCr_{YC}, the number of groups, and the target effect size is the best way to determine the required sample size before data collection.