One-way ANOVA (Analysis of Variance) is a statistical method for comparing the means of three or more groups defined by a single categorical factor, measured on a continuous dependent variable. Think of it as the natural extension of the independent samples t-test: where a t-test handles two groups, ANOVA handles any number of groups simultaneously.

The goal is to determine whether there are statistically significant differences among the group means (e.g., comparing average test scores across five different schools, or comparing treatment effects across three drug dosages)
Running multiple pairwise t-tests instead of a single ANOVA would inflate your Type I error rate. ANOVA controls this by testing all groups in one omnibus test.

Hypothesis testing in one-way ANOVA

The null hypothesis states that all group population means are equal
The alternative hypothesis states that at least one group mean differs from the others
ANOVA is widely used across psychology, biology, social sciences, and medicine to analyze the effect of a single categorical variable on a continuous outcome (e.g., effect of different fertilizers on plant growth)

Components of the one-way ANOVA model

The model equation

The one-way ANOVA model decomposes each observation into three pieces:

$Y_{ij} = \mu + \alpha_i + \varepsilon_{ij}$

where:

$Y_{ij}$ is the $j$ -th observation in the $i$ -th group
$\mu$ is the grand mean, the overall mean of the dependent variable across all observations regardless of group
$\alpha_i$ is the group effect for group $i$ , representing how far that group's mean deviates from the grand mean. A positive $\alpha_i$ means the group mean sits above the grand mean; negative means below. For example, if the grand mean test score is 75 and school A averages 80, then $\alpha_A = 5$ .
$\varepsilon_{ij}$ is the error term, capturing the random deviation of each individual observation from its own group mean. These errors are assumed to be normally distributed with mean zero and constant variance $\sigma^2$ across all groups.

A key constraint is that the group effects sum to zero: $\sum \alpha_i = 0$ (in the fixed-effects parameterization). This keeps the grand mean interpretable as the overall center.

Comparing means across multiple groups, R Tutorial Series: R Tutorial Series: One-Way ANOVA with Pairwise Comparisons

Decomposing variability

The model splits total variability into two sources:

Between-group variability (SSB): How much the group means differ from the grand mean. If the groups truly have different population means, this component will be large.
Within-group variability (SSW): How much individual observations vary around their own group mean. This reflects noise or natural variation unrelated to group membership.

The F-statistic

The F-statistic compares these two sources of variability:

$F = \frac{MSB}{MSW} = \frac{SSB / (k - 1)}{SSW / (N - k)}$

where $k$ is the number of groups and $N$ is the total number of observations.

MSB (Mean Square Between) is the between-group sum of squares divided by its degrees of freedom $(k - 1)$
MSW (Mean Square Within) is the within-group sum of squares divided by its degrees of freedom $(N - k)$

A large F-value means the between-group differences are large relative to the noise within groups, which is evidence against the null hypothesis. For instance, an $F = 5.2$ with $p = 0.01$ would lead you to reject $H_0$ at the 0.05 significance level and conclude that at least one group mean differs.

Under $H_0$ , the F-statistic follows an $F$ -distribution with $(k-1, \, N-k)$ degrees of freedom.

Null and alternative hypotheses

The hypotheses for one-way ANOVA are:

$H_0: \mu_1 = \mu_2 = \cdots = \mu_k$ (all group population means are equal)
$H_a:$ At least one $\mu_i$ differs from the others

The alternative is non-directional. It doesn't tell you which group differs or in what direction. If you reject $H_0$ , you know something differs, but you'll need post-hoc tests (like Tukey's HSD) to identify the specific pairwise differences.

In some research contexts, a more targeted alternative like $H_a: \mu_1 > \mu_2$ might be of interest, but that's handled through planned contrasts rather than the standard omnibus ANOVA F-test.

Assumptions of one-way ANOVA

Valid inference from ANOVA depends on three assumptions. Violating them can bias your results or inflate error rates, so you should always check them.

Comparing means across multiple groups, How to Perform ANOVA in Python

Independence

Observations within and across groups must be independent of each other. The value of one observation should not influence or be related to any other observation. This is primarily ensured by study design (random sampling, random assignment) rather than by a statistical test.

For example, if students in a classroom copy answers from each other, the independence assumption breaks down, and ANOVA results become unreliable.

Normality

The dependent variable should be approximately normally distributed within each group. You can assess this with:

Histograms or Q-Q plots for each group
Formal tests like the Shapiro-Wilk test

Moderate departures from normality are usually not a serious problem, especially when sample sizes are reasonably large and roughly equal across groups. ANOVA is fairly robust to non-normality in those conditions.

Homogeneity of variances (homoscedasticity)

The population variance of the dependent variable should be approximately equal across all groups. You can check this with:

Levene's test (robust to non-normality, generally preferred)
Bartlett's test (more powerful but sensitive to non-normality)

A common rule of thumb: if the ratio of the largest group variance to the smallest is less than about 3:1, standard ANOVA usually performs adequately.

When assumptions are violated

If assumptions don't hold, you have several options:

Unequal variances: Use Welch's ANOVA, which doesn't assume equal variances, or apply a variance-stabilizing transformation (e.g., log transformation for right-skewed data)
Severe non-normality: Consider the Kruskal-Wallis test, a non-parametric alternative that compares rank distributions rather than means
Non-independence: This is the hardest to fix statistically. You may need a different model entirely (e.g., mixed-effects models for clustered data)

2,589 studying →