One-way ANOVA tests whether the means of three or more groups are all equal or if at least one group differs. It extends the logic of a two-sample t-test to situations with multiple groups, and it's the go-to method whenever you need to compare treatments, conditions, or categories without inflating your Type I error rate by running many separate t-tests.

more resources to help you study

practice questions

Purpose of One-Way ANOVA

Compares means across three or more groups or populations (e.g., different drug dosages, teaching methods, or age brackets)
Determines whether at least one group mean is significantly different from the others
Avoids the problem of running multiple t-tests, which would inflate the overall probability of a Type I error (falsely rejecting $H_0$ )

Why not just do several t-tests? If you have 4 groups, that's 6 pairwise comparisons. Each test at $\alpha = 0.05$ compounds the chance of at least one false positive well beyond 5%. ANOVA handles all groups in a single test, keeping your error rate controlled.

Hypotheses for Multiple Group Comparisons

Null hypothesis ( $H_0$ ): All population means are equal. $\mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k$ where $k$ is the number of groups. This assumes any observed differences are due to random variation alone.
Alternative hypothesis ( $H_a$ ): At least one pair of population means differs. $\mu_i \neq \mu_j$ for at least one pair where $i \neq j$ . Note that $H_a$ does not say all means differ. It only claims that at least one group stands apart. For example, if you're comparing four pain medications, rejecting $H_0$ might mean just one medication outperforms the rest while the other three are similar.

Purpose of one-way ANOVA, R Tutorial Series: R Tutorial Series: One-Way ANOVA with Pairwise Comparisons

The F-Statistic and How ANOVA Works

ANOVA works by comparing two sources of variability:

Between-group variability (MSG): How much the group means differ from the overall grand mean. Large values suggest the groups aren't all the same.
Within-group variability (MSE): How much individual observations vary around their own group mean. This reflects natural random spread.

The test statistic is:

$F = \frac{MSG}{MSE} = \frac{\text{between-group variability}}{\text{within-group variability}}$

If the group means are all similar, $MSG$ will be small relative to $MSE$ , producing an $F$ -value near 1.
If at least one group mean is very different, $MSG$ will be large, pushing $F$ well above 1.
The $F$ -statistic always follows the F-distribution, which is right-skewed and defined by two degrees of freedom: $df_1 = k - 1$ (between groups) and $df_2 = N - k$ (within groups), where $N$ is the total sample size.

You reject $H_0$ when the $F$ -value is large enough that the p-value falls below your significance level $\alpha$ .

Assumptions of One-Way ANOVA

Before trusting your ANOVA results, check these conditions:

Independence: Observations are independent, both within and across groups.
Normality: The data within each group should be approximately normally distributed. With larger samples, ANOVA is robust to moderate departures from normality.
Equal variances (homoscedasticity): The population variances across groups should be roughly equal. A common rule of thumb is that the largest sample standard deviation should be no more than about twice the smallest.

Purpose of one-way ANOVA, How to Perform ANOVA in Python

Box Plots for ANOVA Visualization

Box plots give you a quick visual sense of whether group means might differ before you even run the test.

Each box shows the median (center line), the interquartile range (the box, covering the middle 50%), and whiskers extending to the most extreme values within 1.5 × IQR. Points beyond the whiskers are flagged as potential outliers.
Non-overlapping boxes suggest the groups may have meaningfully different centers, though this isn't a formal test.
Overlapping boxes don't guarantee the means are equal. The ANOVA F-test accounts for sample sizes and variability in ways a visual comparison can't.
Box plots also help you check assumptions: look for roughly similar spreads across groups (equal variance) and roughly symmetric distributions (normality).

Post-Hoc Analysis and Effect Size

A significant ANOVA result tells you something differs, but not what. That's where post-hoc analysis comes in.

Post-hoc pairwise comparisons test each pair of group means to pinpoint where the differences are. Common methods include Tukey's HSD and Bonferroni correction, both of which adjust p-values to account for the multiple comparisons problem.
Effect size measures how large the differences are in practical terms, not just whether they're statistically significant. A common effect size for ANOVA is eta-squared ( $\eta^2$ ):

$\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}}$

This tells you the proportion of total variability in the data that's explained by group membership. For instance, $\eta^2 = 0.14$ means 14% of the variation in your outcome is accounted for by which group a subject belongs to. Cohen's rough benchmarks: 0.01 is small, 0.06 is medium, and 0.14 is large.

Statistical significance (small p-value) and practical significance (meaningful effect size) don't always go hand in hand, so always report both.