Residual analysis is how you check whether your multiple regression model actually meets the assumptions it depends on. If those assumptions are violated, your coefficient estimates, p-values, and confidence intervals can all become unreliable. Residual plots are the primary diagnostic tool for catching these problems.

Graphical Representation and Purpose

A residual is the difference between an observed value and the value your model predicted: $e_i = y_i - \hat{y}_i$ . Residual plots graph these differences so you can visually inspect whether the model's assumptions hold.

The four main assumptions you're checking:

Linearity — the relationship between predictors and the response is linear
Homoscedasticity — the variance of residuals stays constant across all levels of the predicted values
Independence of errors — residuals aren't correlated with each other
Normality — residuals follow a normal distribution centered at zero

Creating and Interpreting Residual Plots

You can create residual plots by plotting residuals on the y-axis against several different quantities on the x-axis:

Predicted values ( $\hat{y}$ ) — the most common choice; good for checking linearity and constant variance overall
Each independent variable — helps you spot problems tied to a specific predictor
Order of data collection — useful for detecting autocorrelation in time-ordered data

What you want to see is a random scatter of points centered around zero with no visible pattern. That's a healthy residual plot. What you don't want to see is any systematic shape: curves, fans, clusters, or trends. Those signal assumption violations, which the next sections cover in detail.

Patterns in Residual Plots

Graphical Representation and Purpose, data visualization - Help interpreting Residuals vs Fitted Plots - Cross Validated

Non-Random Patterns and Their Implications

When residuals form a recognizable pattern instead of random scatter, something in your model needs attention.

Curved pattern (U-shape, S-shape, or other nonlinear trend): The linearity assumption is violated. The relationship between your predictors and the response isn't purely linear. You may need to add a higher-order term (like $x^2$ for a quadratic relationship) or switch to a different functional form (logarithmic, exponential, etc.).
Funnel or cone shape (residuals spread out or narrow as predicted values increase): This is heteroscedasticity, meaning the variance of errors isn't constant. More on this below.
Cyclical or wave-like pattern: Often shows up when residuals are plotted against observation order. This suggests the errors aren't independent, which is common in time-series data.

Outliers and Variable-Specific Patterns

Outliers are residuals that fall far from the rest of the data. A single extreme point can pull the regression line toward it, distorting your coefficient estimates. When you spot an outlier, investigate it: Is it a data entry error? A genuinely unusual observation? Depending on the answer, you might correct it, remove it, or flag it and report results with and without it.

Variable-specific patterns are also informative. If the residuals show a distinct trend when plotted against one particular predictor, that predictor's effect may not be captured well by the current model. Common fixes include:

Adding interaction terms (e.g., $x_1 \cdot x_2$ ) if the effect of one predictor depends on the level of another
Applying transformations to the predictor (log, square root) to linearize the relationship

For example, if residuals show a clear split when plotted against a categorical variable like treatment group, the model may need an interaction between that variable and another predictor to properly capture the effect.

Normality of Residuals

Graphical Representation and Purpose, Checking Model Assumptions Using Graphs | Introduction to Statistics

Assessing Normality Visually

The normality assumption states that residuals are normally distributed with a mean of zero. Two visual tools help you check this:

Histogram or density plot of residuals — You're looking for a roughly symmetric, bell-shaped distribution. Obvious skewness or heavy tails are red flags.
Normal probability plot (Q-Q plot) — This plots the quantiles of your residuals against the quantiles of a theoretical normal distribution. If residuals are normal, the points will fall close to a straight diagonal line. Systematic departures from that line (S-curves, bowing) indicate non-normality.

The Q-Q plot is generally more informative than the histogram, especially with smaller samples where histograms can look choppy.

Formal Tests and Implications of Violations

Two common formal tests for normality:

Shapiro-Wilk test — generally preferred for small to moderate sample sizes; more powerful in most situations
Kolmogorov-Smirnov test — a more general test, but less powerful for detecting departures from normality

For both tests, a p-value above your significance level (typically 0.05) means you fail to reject the null hypothesis that residuals are normal.

One important nuance: normality violations matter less as your sample size grows. The Central Limit Theorem ensures that the sampling distributions of your regression coefficients become approximately normal in large samples, even if the residuals themselves aren't perfectly normal. So with a large dataset, mild non-normality is usually not a serious concern.

If non-normality is severe, transforming the dependent variable (e.g., taking $\log(y)$ or $\sqrt{y}$ ) often helps pull the residuals closer to a normal distribution.

Homoscedasticity in Regression Models

Definition and Consequences of Heteroscedasticity

Homoscedasticity means the variance of the residuals is the same regardless of the predicted value or the level of any predictor. When this assumption fails, you have heteroscedasticity.

Heteroscedasticity doesn't bias your coefficient estimates themselves, but it does bias the standard errors of those estimates. That's a problem because standard errors feed directly into t-tests, p-values, and confidence intervals. With biased standard errors, you might conclude a predictor is significant when it isn't (or vice versa).

Detecting and Addressing Heteroscedasticity

Visual detection: Plot residuals against predicted values. A fan or cone shape where the spread of residuals widens (or narrows) as $\hat{y}$ increases is the classic sign.

Formal tests:

Breusch-Pagan test — regresses the squared residuals on the predictors and tests whether the predictors explain significant variation in the residual variance. A small p-value indicates heteroscedasticity.
White test — a more general version that also includes cross-products and squared terms of the predictors, so it can detect more complex forms of heteroscedasticity.

Remedies when heteroscedasticity is present:

Weighted Least Squares (WLS) — gives less weight to observations with higher variance, producing more efficient estimates
Robust (heteroscedasticity-consistent) standard errors — keeps OLS estimates but corrects the standard errors so that inference is valid; often the simplest fix
Variable transformations — applying $\log$ or $\sqrt{}$ to the dependent variable can stabilize the variance across the range of predicted values

The choice among these depends on the severity of the problem and your goals. Robust standard errors are a common default because they don't require you to specify the exact form of the heteroscedasticity.

2,589 studying →