R-squared (the coefficient of determination) measures the proportion of variance in the dependent variable that your model's independent variables can predict. It ranges from 0 to 1:

An R-squared of 0 means the model explains none of the variability in the response around its mean.
An R-squared of 1 means the model explains all of it.

So if you get an R-squared of 0.74, you'd say: "74% of the variation in the dependent variable is explained by the independent variable(s) in this model."

The formula:

$R^2 = 1 - \frac{SS_{Res}}{SS_{Tot}}$

where $SS_{Res}$ is the residual sum of squares (variation the model doesn't explain) and $SS_{Tot}$ is the total sum of squares (the total variation in the dependent variable).

Importance and Usage

R-squared gives you a single number summarizing how well your regression fits the observed data. It's useful for:

Evaluating the strength of the linear relationship between your dependent and independent variables
Comparing candidate models to see which one captures more variability in the response
Communicating model performance across fields like economics, engineering, and the social sciences

That said, R-squared alone doesn't tell you whether the model is correctly specified or whether individual predictors are statistically significant. It's a measure of fit, not of validity.

Calculating R-squared

Required Components

You need two quantities from your regression:

$SS_{Res}$ (Residual Sum of Squares): The sum of squared differences between each observed value $y_i$ and its predicted value $\hat{y}_i$ . This captures the variation your model fails to explain.

$SS_{Res} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$

$SS_{Tot}$ (Total Sum of Squares): The sum of squared differences between each observed value $y_i$ and the overall mean $\bar{y}$ . This captures the total variation in the dependent variable.

$SS_{Tot} = \sum_{i=1}^{n}(y_i - \bar{y})^2$

Calculation Steps

Fit your linear regression model and obtain predicted values $\hat{y}_i$ for every observation.
Compute $SS_{Res}$ by squaring each residual $(y_i - \hat{y}_i)$ and summing them.
Compute $SS_{Tot}$ by squaring each deviation from the mean $(y_i - \bar{y})$ and summing them.
Plug into the formula: $R^2 = 1 - \frac{SS_{Res}}{SS_{Tot}}$

Most statistical software computes this directly:

R: summary(lm_model)$r.squared
Python (scikit-learn): from sklearn.metrics import r2_score; r2_score(y_true, y_pred)

Definition and interpretation, Coefficient of determination - Wikipedia

R-squared Limitations vs. Adjusted R-squared

Limitations of R-squared

R-squared has a structural problem: it never decreases when you add another predictor to the model, even if that predictor is irrelevant. A variable with no real relationship to the response can still reduce $SS_{Res}$ by a tiny amount just by chance, which nudges R-squared upward.

This creates two issues:

Overfitting risk. You can inflate R-squared by throwing in more and more variables, producing a model that fits the training data well but generalizes poorly.
Misleading model comparisons. A model with 15 predictors will almost always have a higher R-squared than a model with 3, regardless of whether those extra 12 variables are meaningful.

R-squared also doesn't tell you whether any individual predictor is statistically significant, or whether a linear model is even the right functional form for your data.

Adjusted R-squared as an Alternative

Adjusted R-squared fixes the "more variables = higher R-squared" problem by introducing a penalty for each additional predictor. If a new variable doesn't improve the fit enough to offset the penalty, adjusted R-squared will actually decrease.

This makes it the better metric when you're comparing models with different numbers of predictors, because it rewards explanatory power while also rewarding parsimony (using fewer variables to achieve a similar fit).

Adjusted R-squared: Formula and Interpretation

Formula

$R^2_{adj} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}$

where $n$ is the number of observations and $k$ is the number of independent variables.

Notice the denominator $n - k - 1$ . As $k$ grows (more predictors), that denominator shrinks, which inflates the fraction being subtracted from 1. So unless $R^2$ increases enough to compensate, $R^2_{adj}$ drops. That's the penalty at work.

A few properties worth noting:

$R^2_{adj}$ is always less than or equal to $R^2$ .
$R^2_{adj}$ can actually go negative if the model fits worse than a simple horizontal line at $\bar{y}$ .
When $k = 0$ (intercept-only model), the formula simplifies and $R^2_{adj}$ equals zero by construction.

Interpretation and Model Comparison

You interpret adjusted R-squared the same way as R-squared: it represents the proportion of variance explained, but adjusted for the number of predictors. A higher value still means a better fit.

The real advantage shows up in model selection. Suppose you're deciding between two models:

Model A: 3 predictors, $R^2 = 0.72$ , $R^2_{adj} = 0.71$
Model B: 8 predictors, $R^2 = 0.75$ , $R^2_{adj} = 0.69$

R-squared alone would favor Model B. But adjusted R-squared tells you those five extra predictors aren't pulling their weight. Model A is the more parsimonious choice here, and $R^2_{adj}$ reflects that.

When comparing models with different numbers of predictors, always use adjusted R-squared rather than R-squared. It balances explanatory power against model complexity, helping you avoid overfitting.

2,589 studying →