13.5 Lab: One-Way ANOVA
One-Way ANOVA lets you test whether three or more group means are significantly different from each other. This lab walks you through performing the test step by step, from setting up your hypotheses to interpreting the output.
Setting Up the Problem
Before running any calculations, you need a clear framework:
-
Identify your groups. These are the categories of your independent variable (e.g., three different teaching methods, four brands of batteries).
-
Identify your response variable. This is the quantitative outcome you're measuring for each group (e.g., test scores, battery life in hours).
-
State your hypotheses.
- : All group means are equal ()
- : At least one group mean is different
-
Choose your significance level. Typically unless told otherwise.
Checking Assumptions
ANOVA results are only trustworthy if three conditions hold:
- Independence. Observations within and across groups should be independent of each other. Random sampling or random assignment helps satisfy this.
- Normality. The data in each group should be approximately normally distributed. With small samples, check histograms or normal probability plots for each group. With larger samples (roughly 30+ per group), the Central Limit Theorem makes this less of a concern.
- Equal variances (homoscedasticity). The spread of data should be similar across groups. A common rule of thumb: if the largest group standard deviation is no more than twice the smallest, you're generally fine.
Calculating the F Statistic
The F statistic compares variation between groups to variation within groups. Here's the process:
- Find each group mean () and the overall mean () of all observations combined.
- Calculate the between-group variation (MSG). This measures how far each group mean is from the overall mean, weighted by group size: where is the number of groups and is the size of group .
- Calculate the within-group variation (MSE). This measures the average spread of observations around their own group mean: where is the variance of group and is the total number of observations.
- Compute the F statistic:
A large F value means the group means differ more than you'd expect from random variation alone.
Reading the ANOVA Table
Most software outputs an ANOVA summary table. Here's what each column means:
| Source | df | SS | MS | F | p-value |
|---|---|---|---|---|---|
| Between (Factor) | SSB | MSG = SSB / () | MSG / MSE | from F distribution | |
| Within (Error) | SSW | MSE = SSW / () | Total |
- df = degrees of freedom
- SS = sum of squares (total variation attributed to that source)
- MS = mean square (SS divided by its df)
- The p-value tells you the probability of getting an F statistic this large (or larger) if were true.
Making a Decision
Compare the p-value to your significance level:
- If : Reject . There is statistically significant evidence that at least one group mean differs.
- If : Fail to reject . You don't have enough evidence to conclude the means differ.
Keep in mind that rejecting does not tell you which specific groups differ. It only tells you that not all means are equal.
Example Walkthrough
Suppose you're comparing average exam scores across three study methods, with 10 students per group.
| Group | Mean | Std Dev |
|---|---|---|
| Method A | 78.2 | 6.1 |
| Method B | 84.5 | 5.8 |
| Method C | 80.0 | 6.4 |
-
Hypotheses: vs. : at least one mean differs.
-
Check assumptions: Groups are independent (randomly assigned). Sample sizes are small, so you'd check normality plots. Standard deviations are similar (largest is 6.4, smallest is 5.8; ratio well under 2).
-
Degrees of freedom: Between: . Within: .
-
Suppose your software gives: , .
-
Decision: Since , reject . There's significant evidence that at least one study method leads to a different mean exam score.
Post-Hoc Testing (What Comes After)
When you reject , you'll often want to know which groups differ. Common follow-up procedures include:
- Tukey's HSD (Honest Significant Difference): Compares every pair of group means while controlling for the increased chance of a false positive from doing multiple comparisons.
- Bonferroni correction: Adjusts the significance level by dividing by the number of pairwise comparisons. Simple but conservative.
These post-hoc tests are only appropriate after a significant ANOVA result.
Common Mistakes to Avoid
- Running multiple two-sample t-tests instead of ANOVA. Each t-test has its own chance of a Type I error. With many comparisons, those errors add up fast. ANOVA handles all groups in a single test.
- Claiming a specific group is different without post-hoc analysis. A significant F test only tells you something differs, not what.
- Ignoring the equal variance assumption. If one group's spread is much larger than the others, your F statistic and p-value may not be reliable.
- Confusing statistical significance with practical significance. A tiny difference in means can be "statistically significant" with a large enough sample. Always look at the actual size of the differences too.