Variance Inflation Factor for Multicollinearity
Calculating and Interpreting VIF
VIF measures how much the variance of a single regression coefficient gets inflated because that predictor is correlated with the other predictors in the model. A coefficient with high variance is unreliable: its estimate bounces around a lot across samples, and its standard error balloons, making hypothesis tests lose power.
The formula is:
where is the you get from regressing predictor on all the other predictors. Think about what this means mechanically:
- Regress on every other predictor in the model.
- Record the from that auxiliary regression. A high means the other predictors can almost perfectly reconstruct .
- Plug into the formula. As , the denominator shrinks toward zero and explodes.
- : predictor is completely uncorrelated with all other predictors (no multicollinearity).
- : the variance of is 5 times larger than it would be if were orthogonal to the other predictors.
- : variance inflated by a factor of 10.
The interpretation is direct: a VIF of 4 literally means the standard error of that coefficient is times what it would be without collinearity.
Rule-of-Thumb Thresholds
There is no single universal cutoff. The most common guidelines:
| VIF Range | Interpretation |
|---|---|
| 1 | No multicollinearity |
| 1–5 | Low to moderate; usually acceptable |
| 5–10 | Moderate; worth investigating |
| > 10 | Severe; likely distorting coefficient estimates |
Some researchers use a stricter threshold of VIF > 4 or even VIF > 2.5, especially in fields where precise coefficient interpretation matters (e.g., causal inference in epidemiology). Others tolerate higher values when the goal is purely prediction and individual coefficients are less important.
The right threshold depends on your context:
- Sample size. Larger samples can tolerate somewhat higher VIF because standard errors are already smaller.
- Purpose of the model. If you need to interpret individual coefficients, be stricter. If you only care about overall prediction, moderate VIF may be acceptable.
- Domain knowledge. Sometimes a theoretically important predictor should stay in the model even with elevated VIF, and you address the collinearity through other means (like ridge regression).
The goal is to balance detecting real problems against throwing out variables that belong in the model.
Condition Number for Multicollinearity

Computing and Interpreting the Condition Number
While VIF diagnoses collinearity one predictor at a time, the condition number gives a single summary of how collinear the entire design matrix is. It captures how sensitive the least-squares solution is to small perturbations in the data.
To compute it:
- Center and scale the columns of the design matrix (so each predictor has mean 0 and unit variance). This removes artificial ill-conditioning caused by different measurement scales.
- Compute the eigenvalues of .
- The condition number is:
A large ratio means at least one direction in predictor space has very little independent variation, which is exactly what multicollinearity does: it collapses the predictor space along one or more dimensions.
Guidelines for Condition Number Values
| Condition Number | Interpretation |
|---|---|
| < 10 | Weak or no multicollinearity |
| 10–30 | Moderate multicollinearity |
| > 30 | Severe multicollinearity; regression estimates may be unstable |
These thresholds come from Belsley, Kuh, and Welsch (1980) and are widely used, though they are approximate. A condition number of 35 in one dataset might cause fewer practical problems than a condition number of 25 in another, depending on the sample size and signal strength.
VIF vs. Condition Number: VIF tells you which predictors are involved in collinearity. The condition number tells you how bad the overall problem is. Use both together. A high condition number flags a problem; VIF values pinpoint where it lives.
Diagnosing Multicollinearity in Practice

A Step-by-Step Diagnostic Workflow
Use multiple tools together rather than relying on any single measure:
- Examine the correlation matrix of the predictors. Pairwise correlations near are an obvious first signal. But note that multicollinearity can exist among three or more variables even when no single pair is highly correlated.
- Calculate VIF for each predictor. Flag any variable with VIF above your chosen threshold. This catches multivariate collinearity that the correlation matrix might miss.
- Compute the condition number of the (centered and scaled) design matrix. This confirms whether the overall problem is moderate or severe.
- Run sensitivity checks. Drop or add a predictor and see if the remaining coefficients shift dramatically in magnitude or sign. Unstable coefficients are a hallmark of collinearity.
Why It Matters: Practical Consequences
Multicollinearity doesn't bias your coefficient estimates (OLS is still unbiased), but it inflates their variance. The practical fallout:
- Inflated standard errors lead to wider confidence intervals and reduced power. A truly important predictor might appear nonsignificant.
- Unstable estimates. Small changes in the data (adding or removing a few observations) can flip the sign or drastically change the magnitude of coefficients.
- Difficulty isolating individual effects. If two predictors move together, the model can't tell which one is driving the response.
Prediction of the response at points within the observed predictor space may still be fine, because the collinear predictors compensate for each other. The problem shows up when you try to interpret coefficients or extrapolate.
Choosing a Remedial Strategy
Once you've confirmed problematic multicollinearity, your main options are:
- Remove or combine redundant predictors. If two variables measure nearly the same thing, drop one or average them into a composite. This is the simplest fix when substantive knowledge supports it.
- Ridge regression. Adds an penalty that shrinks coefficients and stabilizes the estimates. This is the focus of the rest of this unit and is the standard approach when you want to keep all predictors.
- Principal component regression (PCR). Replaces the original predictors with a smaller set of orthogonal principal components, eliminating collinearity by construction. The tradeoff is reduced interpretability.
The right choice depends on whether interpretability or prediction is the priority, and on how much domain knowledge you have about which variables matter.