One-Way ANOVA answers a simple question: are the differences we see between group means real, or could they just be noise? It does this by splitting the total variability in the data into two pieces and comparing them. This section covers how that partitioning works and how the F-test uses it to reach a conclusion.

Components of Variability

Total variability in the response variable breaks into two additive components:

Between-group variability (SSB): Differences among the group means. This is the variation explained by the grouping variable (the treatment effect). If the groups truly differ, this component will be large.
Within-group variability (SSW): Differences among observations inside the same group. This is the variation not explained by the grouping variable, often called error variance. It reflects random noise and individual differences.

These two pieces add up exactly to the total:

$SST = SSB + SSW$

Why Partition Variability?

Decomposing variability isn't just bookkeeping. It serves several purposes:

It reveals how much of the total variation the explanatory variable accounts for versus how much is unexplained noise.
It sets up the F-test, which directly compares these two sources of variation.
It enables effect-size measures like eta-squared ( $\eta^2 = SSB / SST$ ), which tells you the proportion of total variability explained by the grouping variable.

Sum of Squares Calculation

Each sum of squares captures a different source of variation. All three follow the same logic: square the deviations from a relevant mean, then add them up.

Total Sum of Squares (SST)

SST measures the total spread of every observation around the grand mean (the overall mean of all observations, ignoring groups):

$SST = \sum_{i=1}^{n} (y_i - \bar{y})^2$

This includes both the variation that the grouping variable explains and the variation it doesn't.

Between-Group Sum of Squares (SSB)

SSB measures how far each group mean falls from the grand mean. Each squared deviation is weighted by the group's sample size $n_j$ :

$SSB = \sum_{j=1}^{k} n_j (\bar{y}_j - \bar{y})^2$

When group means are spread far apart relative to the grand mean, SSB is large. That's the signal you're looking for.

Components of Variability, An Analysis of Variability in “CatWalk” Locomotor Measurements to Aid Experimental Design and ...

Within-Group Sum of Squares (SSW)

SSW measures how much individual observations vary around their own group mean:

$SSW = \sum_{j=1}^{k} \sum_{i=1}^{n_j} (y_{ij} - \bar{y}_j)^2$

This is pure noise from the model's perspective. No matter how different the groups are, observations within each group will still scatter around their group mean.

Quick check: You can always verify your arithmetic with $SST = SSB + SSW$ . If the numbers don't add up, something went wrong.

Degrees of Freedom

Degrees of freedom (df) adjust each sum of squares for the number of independent pieces of information that went into it. They're essential for computing mean squares.

Total: $df_{total} = n - 1$ , where $n$ is the total number of observations.
Between-group: $df_{between} = k - 1$ , where $k$ is the number of groups. This is the numerator df for the F-statistic.
Within-group: $df_{within} = n - k$ . This is the denominator df for the F-statistic.

These also partition cleanly: $df_{total} = df_{between} + df_{within}$ .

For example, with $k = 4$ groups and $n = 40$ total observations: $df_{between} = 3$ , $df_{within} = 36$ , and $df_{total} = 39$ .

Mean Squares Calculation

Sums of squares on their own aren't directly comparable because they're based on different numbers of free quantities. Dividing by the appropriate degrees of freedom gives you mean squares, which are comparable.

Between-Group Mean Square (MSB)

$MSB = \frac{SSB}{df_{between}} = \frac{SSB}{k - 1}$

MSB estimates the average squared deviation of group means from the grand mean. If the null hypothesis is true (all population means are equal), MSB estimates the same underlying variance that MSW does. If the null is false, MSB will tend to be inflated by the real group differences.

Within-Group Mean Square (MSW)

$MSW = \frac{SSW}{df_{within}} = \frac{SSW}{n - k}$

MSW is the pooled estimate of variance within groups. It estimates the common error variance regardless of whether the group means differ. This is why it serves as the baseline in the F-ratio.

Comparing the Two

The core logic: if group means are truly equal, MSB and MSW should be roughly the same size (both just estimating error variance). If MSB is substantially larger than MSW, the extra variation is likely due to real group differences.

F-Test for Group Differences

Purpose

The F-test asks: Is the between-group variation large enough, relative to within-group variation, that we can't reasonably blame it on chance?

Formally, it tests:

$H_0$ : All group population means are equal ( $\mu_1 = \mu_2 = \cdots = \mu_k$ ).
$H_a$ : At least one group mean differs from the others.

Calculating the F-Statistic

$F = \frac{MSB}{MSW}$

This statistic follows an F-distribution with $(df_{between},\; df_{within})$ degrees of freedom under $H_0$ .

$F \approx 1$ suggests the group means vary about as much as you'd expect from random noise alone.
$F \gg 1$ suggests the groups differ more than noise can explain.
$F$ is always non-negative because both mean squares are non-negative.

Hypothesis Testing and the P-Value

Once you have the F-statistic, you compare it to the F-distribution to get a p-value: the probability of seeing an F-value this large (or larger) if $H_0$ were true.

Choose a significance level (commonly $\alpha = 0.05$ ).
Calculate $F = MSB / MSW$ .
Find the p-value from the $F(df_{between},\; df_{within})$ distribution.
If $p < \alpha$ , reject $H_0$ . Conclude that at least one group mean is significantly different.
If $p \geq \alpha$ , fail to reject $H_0$ . The data don't provide enough evidence of group differences.

Interpreting the Results

Rejecting $H_0$ tells you the explanatory variable has a statistically significant effect on the response, but it does not tell you which specific groups differ. For that, you need post-hoc tests (e.g., Tukey's HSD) to make pairwise comparisons while controlling the family-wise error rate.

Failing to reject $H_0$ means the observed differences between group means are consistent with random sampling variability. It doesn't prove the means are equal; it just means you lack sufficient evidence to say otherwise.

2,589 studying →