Two-way ANOVA: Concept and Purpose
Understanding the Basics
Two-way ANOVA examines how two categorical independent variables (factors) jointly affect a continuous dependent variable. Where one-way ANOVA tests the effect of a single factor, two-way ANOVA handles two factors at once and, critically, tests whether those factors interact with each other.
There are three effects you're testing in every two-way ANOVA:
- Main effect of Factor A: the average effect of one factor on the dependent variable, collapsing across levels of the other factor. For example, the effect of soil type on plant growth, averaging over all fertilizer types.
- Main effect of Factor B: the same idea for the second factor. For example, the effect of fertilizer type on plant growth, averaging over all soil types.
- Interaction effect (A × B): whether the effect of one factor depends on the level of the other factor. For example, sandy soil might boost growth with organic fertilizer but hurt growth with synthetic fertilizer. That pattern, where the effect of one factor changes depending on the other, is an interaction.
Determining Significant Differences
The model tests whether the means of the dependent variable differ significantly across levels of each factor and across their combinations.
Consider comparing average test scores based on teaching method (lecture vs. discussion) and student background (science vs. humanities). Two-way ANOVA lets you ask all three questions at once: Does teaching method matter? Does background matter? Does the advantage of one teaching method change depending on background?
One-way ANOVA could only answer one of those questions at a time, say, whether teaching method alone affects scores. Two-way ANOVA handles the full picture in a single analysis.

Two-way ANOVA: Mathematical Model
Model Components
The two-way ANOVA model decomposes each observation into additive components:
Each term represents a distinct source of variation:
- : the observed value for the -th observation in level of Factor A and level of Factor B (e.g., the test score of the -th student in the -th teaching method and -th background group)
- : the grand mean of the dependent variable across all observations
- : the main effect of Factor A at level , representing how much level of Factor A deviates from the grand mean
- : the main effect of Factor B at level , representing how much level of Factor B deviates from the grand mean
- : the interaction effect for the specific combination of level of A and level of B. This captures any deviation in the cell mean that isn't explained by the two main effects alone.
- : the random error term, assumed
The model is subject to the constraints , , for all , and for all . These sum-to-zero constraints ensure the parameters are identifiable and that the effects represent deviations from the grand mean.

Hypothesis Testing and Interpretation
Two-way ANOVA involves three separate null hypotheses, each tested with its own F-statistic:
- (no main effect of Factor A)
- (no main effect of Factor B)
- (no interaction effect)
Each alternative hypothesis states that at least one of the respective effects is nonzero.
The F-test for each effect compares the variance explained by that term (its mean square) to the unexplained variance (mean square error). A large F-ratio suggests the effect explains more variability than you'd expect from random noise alone.
Interpretation depends on which effects are significant:
- If a main effect is significant but the interaction is not, you can interpret that main effect straightforwardly. For instance, if teaching method is significant (lecture scores higher than discussion) and there's no interaction, lecture outperforms discussion regardless of student background.
- If the interaction is significant, you need to be cautious interpreting main effects in isolation. The interaction tells you that the effect of one factor changes across levels of the other, so reporting a single "main effect" can be misleading. In that case, examine the cell means or simple effects to understand the pattern.
Assumptions of Two-way ANOVA
Key Assumptions
Two-way ANOVA relies on the same core assumptions as other linear models, applied to the cell structure of the design:
- Independence: Observations within and across all cells must be independent. One student's test score should not influence another's. This is primarily a design issue, not something you can fix statistically after the fact.
- Normality: The residuals () should be approximately normally distributed within each cell. With balanced designs and reasonable sample sizes, the F-test is fairly robust to moderate departures from normality.
- Homogeneity of variances (homoscedasticity): The error variance should be the same across all cells. If the variability of test scores in the lecture/science group is much larger than in the discussion/humanities group, this assumption is violated.
- No influential outliers: Extreme values in any cell can distort the group means and inflate or mask effects.
- Fixed effects: The levels of both factors are specifically chosen by the researcher, not randomly sampled from a larger population. If levels are randomly sampled, you'd need a random-effects or mixed-effects model instead.
Assessing Assumption Validity
Each assumption can be checked with specific tools:
- Independence is ensured through proper experimental design, particularly random assignment of subjects to conditions. No statistical test can fully verify it after data collection.
- Normality can be assessed by examining residuals with Q-Q plots, histograms, or formal tests like the Shapiro-Wilk test. Check residuals within each cell if sample sizes permit, or check the overall residual distribution.
- Homogeneity of variances can be tested with Levene's test or by visually inspecting a residual-vs.-fitted-values plot for a consistent spread across groups.
- Outliers can be spotted using boxplots of residuals by cell, standardized residuals (values beyond are suspect), or Cook's distance for influential points.
Violations of these assumptions can lead to inflated or deflated Type I error rates, meaning your p-values may not be trustworthy. Mild violations, especially of normality, are often tolerable with balanced designs. Serious heteroscedasticity or non-independence is more problematic and may require data transformations, robust methods, or a different modeling approach.