The F ratio is the core statistic in ANOVA. It compares how much the group means vary from each other (between-group variance) to how much individual observations vary within their own groups (within-group variance). Think of it as a signal-to-noise ratio: the "signal" is the difference between groups, and the "noise" is the natural spread within groups.

$F = \frac{MS_{between}}{MS_{within}}$

Here's how to calculate it step by step:

Compute the sum of squares between groups ( $SS_{between}$ ): This measures how far each group mean is from the overall (grand) mean, weighted by group size.
Compute the sum of squares within groups ( $SS_{within}$ ): This measures how far individual observations fall from their own group mean.
Calculate degrees of freedom:
- $df_{between} = k - 1$ , where $k$ is the number of groups
- $df_{within} = N - k$ , where $N$ is the total number of observations across all groups
Compute mean squares by dividing each sum of squares by its degrees of freedom:
- $MS_{between} = \frac{SS_{between}}{df_{between}}$
- $MS_{within} = \frac{SS_{within}}{df_{within}}$
Divide $MS_{between}$ by $MS_{within}$ to get the F ratio.

A large F ratio suggests the group means differ more than you'd expect from random variation alone. An F ratio near 1 suggests the between-group differences are about the same size as the within-group noise.

Interpretation of the F Statistic

The F distribution is the sampling distribution of the F ratio when the null hypothesis (all group means are equal) is true. A few properties to know:

It is right-skewed and takes only non-negative values, since variances can't be negative.
Its exact shape depends on two degrees of freedom: $df_{between}$ (numerator) and $df_{within}$ (denominator). Changing either one changes the shape of the curve.
As both degrees of freedom increase, the distribution becomes less skewed and more symmetric.

To determine whether group means differ significantly:

Calculate the F ratio from your data.
Choose a significance level (typically $\alpha = 0.05$ ).
Find the critical F value from an F distribution table using $df_{between}$ , $df_{within}$ , and your chosen $\alpha$ . Alternatively, find the p-value directly.
Make your decision:
- If $F > F_{critical}$ (or if $p < \alpha$ ), reject the null hypothesis. At least one group mean is significantly different from the others.
- If $F \leq F_{critical}$ (or if $p \geq \alpha$ ), fail to reject the null hypothesis. There isn't enough evidence to conclude the group means differ.

One common mistake: rejecting the null tells you at least one mean differs, but it doesn't tell you which means differ. You'd need a post-hoc test (like Tukey's HSD) for that.

Construction of the ANOVA Table

The ANOVA table organizes every piece of the calculation into one place. Here's the standard layout:

Source of Variation	SS	df	MS	F	p-value
Between Groups	$SS_{between}$	$k - 1$	$\frac{SS_{between}}{k - 1}$	$\frac{MS_{between}}{MS_{within}}$	from F distribution
Within Groups	$SS_{within}$	$N - k$	$\frac{SS_{within}}{N - k}$
Total	$SS_{total}$	$N - 1$

Steps to build it:

Calculate $SS_{between}$ , $SS_{within}$ , and $SS_{total}$ . Note that $SS_{total} = SS_{between} + SS_{within}$ , which serves as a useful check.
Fill in the degrees of freedom: $k - 1$ , $N - k$ , and $N - 1$ .
Compute $MS_{between}$ and $MS_{within}$ by dividing each SS by its df.
Compute the F ratio.
Determine the p-value using the F distribution with $df_{between}$ and $df_{within}$ .

The p-value represents the probability of observing an F ratio at least as large as the one you calculated, assuming the null hypothesis is true. A small p-value means your observed group differences are unlikely to have occurred by chance alone.

Why the F Distribution Matters

The F distribution, named after statistician Ronald Fisher, connects variance estimation to hypothesis testing. In a one-way ANOVA, you're really asking one question: Is the variability between group means larger than what random sampling variation would produce? The F ratio quantifies that comparison, and the F distribution tells you how likely your result is under the assumption of no real group differences.

This framework extends well beyond comparing means. F tests appear in regression analysis, testing whether multiple predictors jointly matter, and in comparing the fit of nested models. Mastering the logic here gives you a foundation for many advanced statistical methods.

2,589 studying →