Fiveable

🥖Linear Modeling Theory Unit 2 Review

QR code for Linear Modeling Theory practice questions

2.3 Measures of Model Fit: R-squared and Adjusted R-squared

2.3 Measures of Model Fit: R-squared and Adjusted R-squared

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Coefficient of Determination (R-squared)

Definition and Interpretation

R-squared (the coefficient of determination) measures the proportion of variance in the dependent variable that your model's independent variables can predict. It ranges from 0 to 1:

  • An R-squared of 0 means the model explains none of the variability in the response around its mean.
  • An R-squared of 1 means the model explains all of it.

So if you get an R-squared of 0.74, you'd say: "74% of the variation in the dependent variable is explained by the independent variable(s) in this model."

The formula:

R2=1SSResSSTotR^2 = 1 - \frac{SS_{Res}}{SS_{Tot}}

where SSResSS_{Res} is the residual sum of squares (variation the model doesn't explain) and SSTotSS_{Tot} is the total sum of squares (the total variation in the dependent variable).

Importance and Usage

R-squared gives you a single number summarizing how well your regression fits the observed data. It's useful for:

  • Evaluating the strength of the linear relationship between your dependent and independent variables
  • Comparing candidate models to see which one captures more variability in the response
  • Communicating model performance across fields like economics, engineering, and the social sciences

That said, R-squared alone doesn't tell you whether the model is correctly specified or whether individual predictors are statistically significant. It's a measure of fit, not of validity.

Calculating R-squared

Required Components

You need two quantities from your regression:

  • SSResSS_{Res} (Residual Sum of Squares): The sum of squared differences between each observed value yiy_i and its predicted value y^i\hat{y}_i. This captures the variation your model fails to explain.

SSRes=i=1n(yiy^i)2SS_{Res} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2

  • SSTotSS_{Tot} (Total Sum of Squares): The sum of squared differences between each observed value yiy_i and the overall mean yˉ\bar{y}. This captures the total variation in the dependent variable.

SSTot=i=1n(yiyˉ)2SS_{Tot} = \sum_{i=1}^{n}(y_i - \bar{y})^2

Calculation Steps

  1. Fit your linear regression model and obtain predicted values y^i\hat{y}_i for every observation.

  2. Compute SSResSS_{Res} by squaring each residual (yiy^i)(y_i - \hat{y}_i) and summing them.

  3. Compute SSTotSS_{Tot} by squaring each deviation from the mean (yiyˉ)(y_i - \bar{y}) and summing them.

  4. Plug into the formula: R2=1SSResSSTotR^2 = 1 - \frac{SS_{Res}}{SS_{Tot}}

Most statistical software computes this directly:

  • R: summary(lm_model)$r.squared
  • Python (scikit-learn): from sklearn.metrics import r2_score; r2_score(y_true, y_pred)
Definition and interpretation, Coefficient of determination - Wikipedia

R-squared Limitations vs. Adjusted R-squared

Limitations of R-squared

R-squared has a structural problem: it never decreases when you add another predictor to the model, even if that predictor is irrelevant. A variable with no real relationship to the response can still reduce SSResSS_{Res} by a tiny amount just by chance, which nudges R-squared upward.

This creates two issues:

  • Overfitting risk. You can inflate R-squared by throwing in more and more variables, producing a model that fits the training data well but generalizes poorly.
  • Misleading model comparisons. A model with 15 predictors will almost always have a higher R-squared than a model with 3, regardless of whether those extra 12 variables are meaningful.

R-squared also doesn't tell you whether any individual predictor is statistically significant, or whether a linear model is even the right functional form for your data.

Adjusted R-squared as an Alternative

Adjusted R-squared fixes the "more variables = higher R-squared" problem by introducing a penalty for each additional predictor. If a new variable doesn't improve the fit enough to offset the penalty, adjusted R-squared will actually decrease.

This makes it the better metric when you're comparing models with different numbers of predictors, because it rewards explanatory power while also rewarding parsimony (using fewer variables to achieve a similar fit).

Adjusted R-squared: Formula and Interpretation

Formula

Radj2=1(1R2)(n1)nk1R^2_{adj} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}

where nn is the number of observations and kk is the number of independent variables.

Notice the denominator nk1n - k - 1. As kk grows (more predictors), that denominator shrinks, which inflates the fraction being subtracted from 1. So unless R2R^2 increases enough to compensate, Radj2R^2_{adj} drops. That's the penalty at work.

A few properties worth noting:

  • Radj2R^2_{adj} is always less than or equal to R2R^2.
  • Radj2R^2_{adj} can actually go negative if the model fits worse than a simple horizontal line at yˉ\bar{y}.
  • When k=0k = 0 (intercept-only model), the formula simplifies and Radj2R^2_{adj} equals zero by construction.

Interpretation and Model Comparison

You interpret adjusted R-squared the same way as R-squared: it represents the proportion of variance explained, but adjusted for the number of predictors. A higher value still means a better fit.

The real advantage shows up in model selection. Suppose you're deciding between two models:

  • Model A: 3 predictors, R2=0.72R^2 = 0.72, Radj2=0.71R^2_{adj} = 0.71
  • Model B: 8 predictors, R2=0.75R^2 = 0.75, Radj2=0.69R^2_{adj} = 0.69

R-squared alone would favor Model B. But adjusted R-squared tells you those five extra predictors aren't pulling their weight. Model A is the more parsimonious choice here, and Radj2R^2_{adj} reflects that.

When comparing models with different numbers of predictors, always use adjusted R-squared rather than R-squared. It balances explanatory power against model complexity, helping you avoid overfitting.