Partial F-tests for Model Selection
Assessing Predictor Significance
Partial F-tests let you test whether a group of predictors meaningfully contributes to a regression model, after accounting for the other predictors already in the model. This is different from individual t-tests on coefficients, which only test one predictor at a time.
The setup involves two nested models:
- The full model contains all predictors of interest.
- The reduced model drops the subset of predictors you want to test.
The hypotheses are:
- Null hypothesis (): The coefficients on all the dropped predictors equal zero. That subset adds no explanatory power beyond what the reduced model already captures.
- Alternative hypothesis (): At least one of the dropped predictors has a nonzero coefficient.
For example, if your full model predicts salary from age, income, education, and experience, you might test whether education and experience together contribute significantly by comparing the full model to a reduced model that only includes age and income.
Calculating the Test Statistic
The partial F-statistic measures how much the residual sum of squares (RSS) increases when you drop the predictors in question, scaled by degrees of freedom:
where:
- = residual sum of squares for the reduced model
- = residual sum of squares for the full model
- = number of predictors dropped (difference in number of parameters between models)
- = sample size
- = number of predictors in the full model
Under , this statistic follows an distribution. A large F-value means the dropped predictors were absorbing a lot of variation, and removing them hurts the fit substantially.
You then compare the F-statistic to the F-distribution (or just look at the p-value). A small p-value (typically below 0.05) leads you to reject and conclude that at least one of the dropped predictors matters.
Comparing Nested Models

Defining Nested Models
Two models are nested when the reduced model is a special case of the full model. Specifically, you get the reduced model by constraining some coefficients in the full model to zero.
- Full model:
- Reduced model:
Here the reduced model is nested within the full model because it sets and . The partial F-test asks: does forcing those coefficients to zero significantly worsen the fit?
This nesting requirement is strict. You cannot use a partial F-test to compare two models that contain different sets of predictors with neither being a subset of the other. For non-nested comparisons, you'd need other tools like AIC or cross-validation.
Test Statistic and Interpretation
The calculation is the same formula from above. Here's how to walk through the decision:
-
Fit the full model and record and its degrees of freedom.
-
Fit the reduced model and record and its degrees of freedom.
-
Compute the F-statistic using the formula.
-
Find the p-value from the distribution.
-
If the p-value is below your significance level (e.g., 0.05), reject and conclude the additional predictors improve the model.
Rejecting tells you the group of predictors matters collectively. It does not tell you which specific predictor in the group is driving the effect. You might follow up with individual t-tests or additional partial F-tests to narrow things down.
Interpreting Partial F-test Results

Model Selection Implications
- Significant result (low p-value): The dropped predictors contribute meaningful explanatory power. Keep them in the model.
- Non-significant result (high p-value): Removing those predictors doesn't significantly hurt the fit. You can drop them to achieve a more parsimonious model (simpler, with fewer parameters, but similar explanatory power).
Partial F-tests fit naturally into stepwise model-building strategies:
- Backward elimination: Start with all candidate predictors, then iteratively test subsets for removal. Drop the least useful group, refit, and repeat.
- Forward selection: Start with a minimal model and test whether adding groups of predictors significantly improves fit.
Keep in mind that stepwise procedures can be sensitive to the order of testing, especially when predictors are correlated with each other.
Assessing Predictor Importance
The size of the change in RSS when you remove a predictor (or group) gives a rough sense of that predictor's importance. A predictor whose removal causes a large jump in RSS is absorbing a lot of variation in the response.
However, statistical significance alone isn't enough for good modeling decisions:
- A predictor can be statistically significant but practically unimportant (tiny effect size in a large sample).
- A predictor can be non-significant in a partial F-test but still theoretically important based on domain knowledge.
- Subject-matter reasoning should always inform which predictors belong in your model, not just p-values.
Advantages vs. Limitations of Partial F-tests
Benefits
- They test groups of predictors simultaneously, which individual t-tests cannot do properly when predictors are correlated.
- They provide a formal hypothesis-testing framework for comparing nested models.
- They directly quantify whether added model complexity is justified by improved fit.
- Results are easy to interpret: reject or fail to reject, with a clear p-value.
Drawbacks and Considerations
- Nesting requirement: Both models must be nested. For non-nested comparisons, use information criteria (AIC, BIC) or cross-validation instead.
- Sensitivity to multicollinearity: When predictors are highly correlated, the test can be unstable. The order in which you add or remove predictors may change your conclusions.
- No absolute fit information: The test tells you whether one model fits better than another, not whether either model fits well in an absolute sense. A significant partial F-test doesn't mean the full model is actually a good model.
- Assumption dependence: Like all F-tests in regression, partial F-tests assume linearity, independence of errors, constant variance (homoscedasticity), and approximately normal errors. Violations of these assumptions (outliers, heteroscedasticity, non-normality) can distort results.
- Complementary tools: Penalized regression methods like LASSO and Ridge regression handle variable selection differently by shrinking coefficients, which can be more robust when you have many correlated predictors. Cross-validation provides a direct estimate of predictive performance. These approaches work well alongside partial F-tests.