Hypothesis tests for individual coefficients answer a focused question: does a specific predictor actually contribute to the model, or could its apparent effect be due to chance? In multiple regression, this matters because you're evaluating each predictor while accounting for all the others in the model. That "while controlling for everything else" part is what makes these tests different from simple linear regression t-tests.

This topic covers how to set up and carry out these tests, how to interpret the results (both statistically and practically), and how multicollinearity can undermine the whole process.

Hypothesis Testing for Coefficients

Formulating Hypotheses

For any individual coefficient $\beta_j$ in a multiple regression model, the hypotheses take this form:

Null hypothesis $H_0: \beta_j = 0$ . This says the predictor $x_j$ has no linear relationship with the response variable after controlling for all other predictors in the model.
Alternative hypothesis can be:
- Two-sided: $H_a: \beta_j \neq 0$ (the coefficient is not zero)
- One-sided: $H_a: \beta_j > 0$ or $H_a: \beta_j < 0$

The "controlling for other predictors" piece is critical. A predictor might be strongly correlated with the response on its own, but once you account for other variables, it may no longer add useful information. That's exactly what these tests evaluate.

Conducting the Test

The test statistic is:

$t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}$

where $\hat{\beta}_j$ is the estimated coefficient and $\text{SE}(\hat{\beta}_j)$ is its standard error. Under the null hypothesis, this statistic follows a t-distribution.

To carry out the test:

Compute the t-statistic using the formula above (most software reports this automatically in the regression output table).
Compare the t-statistic to a t-distribution with $n - p - 1$ degrees of freedom.
Obtain the p-value. For a two-sided test, this is the probability of observing a t-statistic at least as extreme in either direction.
Compare the p-value to your significance level (typically $\alpha = 0.05$ ).

If the p-value is less than $\alpha$ , you reject $H_0$ and conclude that $x_j$ has a statistically significant linear relationship with the response, controlling for the other predictors. If the p-value is not less than $\alpha$ , you don't have sufficient evidence to conclude that $x_j$ contributes to the model.

Formulating Hypotheses, Hypothesis Testing (3 of 5) | Concepts in Statistics

Degrees of Freedom for t-tests

The degrees of freedom for these t-tests are:

$df = n - p - 1$

$n$ = number of observations
$p$ = number of predictor variables

The " $-1$ " accounts for the intercept. So the total number of parameters estimated is $p + 1$ (one coefficient per predictor, plus the intercept), and the degrees of freedom reflect how many independent pieces of information remain after estimating all those parameters.

For example, if you have 50 observations and 4 predictors, $df = 50 - 4 - 1 = 45$ . As you add more predictors to a model, degrees of freedom decrease, which makes the t-distribution slightly wider and your tests slightly less powerful. This is one reason you don't want to throw every available predictor into a model without justification.

Interpreting Coefficient Tests

Statistical Significance

A small p-value tells you the predictor has a statistically significant linear relationship with the response after controlling for the other predictors. But the test result alone doesn't tell you the whole story. You should also look at:

The sign of $\hat{\beta}_j$ : A positive coefficient means the response tends to increase as $x_j$ increases (holding other predictors constant). A negative coefficient means the response tends to decrease.
The magnitude of the t-statistic: Larger absolute values indicate stronger evidence against the null. A t-statistic of 5.2 is much more convincing than one of 2.1, even if both are "significant."

Formulating Hypotheses, hypothesis testing - Distribution of test statistic under null and alternative - Cross Validated

Practical Significance

Statistical significance doesn't guarantee that a predictor matters in any real-world sense. The estimated coefficient $\hat{\beta}_j$ represents the expected change in the response for a one-unit increase in $x_j$ , holding all other predictors constant.

Whether that change is meaningful depends on context. A coefficient of 0.002 might be statistically significant with a large sample but practically irrelevant if the response variable is measured in thousands of dollars. Always consider the scale of the variables and what a "one-unit increase" actually represents in the problem.

Confidence intervals complement the hypothesis test by giving a range of plausible values for $\beta_j$ :

$\hat{\beta}_j \pm t^* \cdot \text{SE}(\hat{\beta}_j)$

where $t^*$ is the critical value from the t-distribution at your chosen confidence level. A narrow interval suggests a precise estimate; a wide interval suggests more uncertainty. If the interval contains zero, that's consistent with failing to reject $H_0$ .

Multicollinearity's Impact on Tests

Understanding Multicollinearity

Multicollinearity occurs when predictor variables are highly correlated with each other. This doesn't violate the assumptions of regression or bias the coefficient estimates, but it does inflate their standard errors. Here's why that matters:

Inflated standard errors lead to smaller t-statistics (since $t = \hat{\beta}_j / \text{SE}(\hat{\beta}_j)$ ), which lead to larger p-values. A predictor that genuinely matters can appear non-significant simply because its effect is hard to separate from a correlated predictor.
Wider confidence intervals make the estimates less precise. You know the predictors together explain variation in the response, but you can't pin down how much each one contributes individually.
Unstable estimates: Small changes in the data (adding or removing a few observations) can cause large swings in the estimated coefficients when multicollinearity is severe.

Think of it this way: if two predictors carry nearly the same information, the model struggles to assign credit between them. The overall fit may be fine, but the individual coefficient tests become unreliable.

Assessing and Addressing Multicollinearity

Variance Inflation Factors (VIFs) are the standard diagnostic tool. The VIF for predictor $x_j$ measures how much the variance of $\hat{\beta}_j$ is inflated due to correlation with the other predictors.

A VIF of 1 means no multicollinearity for that predictor.
VIFs above 5 suggest moderate multicollinearity.
VIFs above 10 are generally considered problematic.

When multicollinearity is a concern, common strategies include:

Removing or combining redundant predictors: If two predictors measure nearly the same thing, you may only need one, or you could create a single combined variable.
Collecting more data: Larger samples reduce standard errors, which can partially offset the inflation caused by multicollinearity.
Regularization techniques: Methods like ridge regression or lasso regression add a penalty term that shrinks coefficient estimates and can handle correlated predictors more gracefully. (These go beyond standard OLS but are worth knowing about.)

The key takeaway: always check for multicollinearity before interpreting individual coefficient tests. A non-significant p-value in the presence of high VIFs doesn't necessarily mean the predictor is unimportant. It may just mean the model can't separate its effect from another correlated predictor.

2,589 studying →