ANOVA Table Components
The Analysis of Variance (ANOVA) table breaks down the total variability in your response variable into two parts: what your regression model explains and what it doesn't. This decomposition is how you assess whether your model, taken as a whole, is doing meaningful work.
The F-statistic derived from the ANOVA table tests whether at least one predictor has a real relationship with the response. It's the primary tool for evaluating overall model significance in regression.
Key Components and Their Meanings
The standard ANOVA table for regression has these columns:
- Source of Variation: Two rows that matter: Model (also called Regression) and Error (also called Residual). A Total row sums them up.
- Degrees of Freedom (df): The number of independent pieces of information used in each estimate. The model df equals the number of predictors . The error df equals , where is the sample size. Total df is .
- Sum of Squares (SS): Measures the variability attributed to each source.
- Mean Square (MS): Sum of squares divided by its corresponding degrees of freedom. This standardizes the variability so you can compare the model and error components fairly.
- F-value: The ratio . This is the test statistic for overall model significance.
- P-value: The probability of observing an F-value this large (or larger) if none of the predictors actually mattered.
Partitioning Variability
The core idea is that total variability splits cleanly into two pieces:
- SST (Total Sum of Squares): All the variability in the response variable around its mean.
- SSR (Regression Sum of Squares): The portion of variability your model explains. This is the Model row.
- SSE (Error Sum of Squares): The leftover variability your model doesn't capture. This is the Error row.
Every observation's deviation from the overall mean can be decomposed into a part the model predicts and a residual part it misses. The ANOVA table quantifies both.
Explained vs. Unexplained Variation
Total Sum of Squares (SST)
SST measures how much the response variable varies overall. It's calculated as the sum of squared differences between each observed value and the overall mean :
This is the baseline variability your model is trying to account for. If SST is large, there's a lot of spread in the data to explain.

Regression Sum of Squares (SSR)
SSR captures how much of that total variability the model explains. It's the sum of squared differences between the predicted values and the overall mean:
A larger SSR relative to SST means the model is capturing more of the pattern in the data.
Error Sum of Squares (SSE)
SSE is the leftover variability, the part the model misses. It's the sum of squared residuals:
A smaller SSE means the predicted values are closer to the observed values, which indicates a better fit. SSE is also the quantity that least squares estimation minimizes.
F-statistic Interpretation
Calculating the F-statistic
The F-statistic compares explained variability per degree of freedom to unexplained variability per degree of freedom:
-
Compute the mean square for regression: , where is the number of predictors.
-
Compute the mean square for error: .
-
Take the ratio: .
You divide by degrees of freedom because a model with more predictors will naturally explain more variability just by having more terms. The mean squares adjust for this.

Interpreting the F-statistic
Under the null hypothesis that all regression coefficients equal zero (), the F-statistic follows an -distribution with and degrees of freedom.
- A large F-value means the model explains much more variability per degree of freedom than random noise does. This is evidence that at least one predictor matters.
- An F-value near 1 suggests the model isn't doing much better than chance. The explained and unexplained variability per df are roughly equal.
- The p-value tells you the probability of seeing an F this large if none of the predictors had any real effect. If the p-value falls below your significance level (commonly 0.05), you reject and conclude the model is statistically significant.
Note that a significant F-test tells you at least one predictor is useful. It doesn't tell you which predictors are significant or how practically important the model is.
Regression Model Significance
Assessing Overall Significance
The ANOVA F-test is a global test of your regression model. Here's the hypothesis framework:
- : All slope coefficients are zero (the predictors collectively have no linear relationship with the response).
- : At least one slope coefficient is nonzero.
Rejecting means the model as a whole has statistically significant predictive power. But keep two things in mind: significance depends heavily on sample size (with large , even trivially small effects become significant), and a significant model isn't necessarily a useful model. You need to look at effect sizes too.
Coefficient of Determination ()
The ANOVA table gives you everything you need to compute :
This tells you the proportion of total variability explained by the model. For example, means the model accounts for 74% of the variation in the response.
- Values close to 1 indicate the model captures most of the variability, suggesting a strong fit.
- Values close to 0 indicate the model explains very little, and most variability remains in the residuals.
always increases (or stays the same) when you add more predictors, even useless ones. That's why in multiple regression you'll often use adjusted , which penalizes for adding predictors that don't improve the fit enough to justify the lost degree of freedom:
Together, the F-test and give you complementary information: the F-test tells you whether the model is statistically significant, while tells you how much variability it actually explains.