The F Distribution and the F Ratio

Calculation of the F Ratio
The F ratio is the core statistic in ANOVA. It compares how much the group means vary from each other (between-group variance) to how much individual observations vary within their own groups (within-group variance). Think of it as a signal-to-noise ratio: the "signal" is the difference between groups, and the "noise" is the natural spread within groups.
Here's how to calculate it step by step:
-
Compute the sum of squares between groups (): This measures how far each group mean is from the overall (grand) mean, weighted by group size.
-
Compute the sum of squares within groups (): This measures how far individual observations fall from their own group mean.
-
Calculate degrees of freedom:
- , where is the number of groups
- , where is the total number of observations across all groups
-
Compute mean squares by dividing each sum of squares by its degrees of freedom:
-
Divide by to get the F ratio.
A large F ratio suggests the group means differ more than you'd expect from random variation alone. An F ratio near 1 suggests the between-group differences are about the same size as the within-group noise.
Interpretation of the F Statistic
The F distribution is the sampling distribution of the F ratio when the null hypothesis (all group means are equal) is true. A few properties to know:
- It is right-skewed and takes only non-negative values, since variances can't be negative.
- Its exact shape depends on two degrees of freedom: (numerator) and (denominator). Changing either one changes the shape of the curve.
- As both degrees of freedom increase, the distribution becomes less skewed and more symmetric.
To determine whether group means differ significantly:
- Calculate the F ratio from your data.
- Choose a significance level (typically ).
- Find the critical F value from an F distribution table using , , and your chosen . Alternatively, find the p-value directly.
- Make your decision:
- If (or if ), reject the null hypothesis. At least one group mean is significantly different from the others.
- If (or if ), fail to reject the null hypothesis. There isn't enough evidence to conclude the group means differ.
One common mistake: rejecting the null tells you at least one mean differs, but it doesn't tell you which means differ. You'd need a post-hoc test (like Tukey's HSD) for that.
Construction of the ANOVA Table
The ANOVA table organizes every piece of the calculation into one place. Here's the standard layout:
| Source of Variation | SS | df | MS | F | p-value |
|---|---|---|---|---|---|
| Between Groups | from F distribution | ||||
| Within Groups | Total | ||||
| Steps to build it: |
-
Calculate , , and . Note that , which serves as a useful check.
-
Fill in the degrees of freedom: , , and .
-
Compute and by dividing each SS by its df.
-
Compute the F ratio.
-
Determine the p-value using the F distribution with and .
The p-value represents the probability of observing an F ratio at least as large as the one you calculated, assuming the null hypothesis is true. A small p-value means your observed group differences are unlikely to have occurred by chance alone.
Why the F Distribution Matters
The F distribution, named after statistician Ronald Fisher, connects variance estimation to hypothesis testing. In a one-way ANOVA, you're really asking one question: Is the variability between group means larger than what random sampling variation would produce? The F ratio quantifies that comparison, and the F distribution tells you how likely your result is under the assumption of no real group differences.
This framework extends well beyond comparing means. F tests appear in regression analysis, testing whether multiple predictors jointly matter, and in comparing the fit of nested models. Mastering the logic here gives you a foundation for many advanced statistical methods.