Residual analysis is how you check whether your time series model is actually doing its job. After fitting a model, the residuals (the differences between what you observed and what the model predicted) should look like random noise. If they don't, your model is missing something. This section covers the main plots and statistical tests you'll use to diagnose problems and decide how to fix them.

Importance of Residual Analysis

A well-specified time series model should extract all the systematic patterns from the data, leaving behind only unpredictable noise. Residual analysis tests whether that's actually happened.

Residuals are the differences between observed values and model-fitted values: $e_t = y_t - \hat{y}_t$
If the model is good, residuals should behave like white noise: no patterns, no autocorrelation, roughly constant variance, and approximately normal distribution
Residual analysis also helps you spot outliers or influential observations that may be distorting your model's estimates

Think of it this way: if you can still find a pattern in the residuals, that pattern is information your model failed to capture, and you should go back and improve the model.

Importance of residual analysis, Checking Model Assumptions Using Graphs | Introduction to Statistics

Interpretation of Residual Plots

Residual plots are your first line of defense. Before running any formal tests, visually inspect these graphs:

Residuals vs. fitted values: checks for heteroscedasticity (non-constant variance) and non-linearity
- Ideal: residuals scattered randomly around zero with no fan shape or curve
Residuals vs. time: checks for trends, seasonal patterns, or shifts in variance over time
- Ideal: residuals fluctuate randomly around zero with no systematic drift
ACF and PACF of residuals: checks for leftover autocorrelation the model didn't capture
- Ideal: nearly all spikes fall within the 95% confidence bands

When you spot a problem in these plots, it points toward a specific fix:

Pattern You See	What It Suggests	Possible Fix
Fan or funnel shape in residuals vs. fitted	Non-constant variance (heteroscedasticity)	Variance-stabilizing transformation (e.g., log) or a GARCH-type model
Curved pattern in residuals vs. fitted	Non-linearity	Add non-linear terms or apply a transformation
Trend or drift in residuals vs. time	Non-stationarity not fully addressed	Additional differencing or trend terms
Significant spikes in ACF/PACF	Residual autocorrelation	Add AR or MA terms to the model

Importance of residual analysis, regression - Interpreting the residuals vs. fitted values plot for verifying the assumptions of ...

Diagnostic Tests for Autocorrelation

Visual inspection of ACF/PACF plots is useful but subjective. Formal tests give you a more rigorous answer about whether residual autocorrelation is present.

Ljung-Box Test

This is the most commonly used test for residual autocorrelation in time series. It checks multiple lags simultaneously rather than testing one lag at a time.

Null hypothesis ( $H_0$ ): Residuals are independently distributed (no autocorrelation up to lag $h$ )
Alternative hypothesis ( $H_a$ ): Residuals exhibit autocorrelation at one or more lags

The test statistic is:

$Q = n(n+2) \sum_{k=1}^{h} \frac{r_k^2}{n-k}$

where $n$ is the sample size, $h$ is the number of lags tested, and $r_k$ is the sample autocorrelation of residuals at lag $k$ .

How to use it:

Choose the number of lags $h$ to test (a common rule of thumb is $h = 10$ for non-seasonal data, or $h = 2m$ where $m$ is the seasonal period)
Compute the $Q$ statistic from the residual autocorrelations
Compare $Q$ to the critical value from a chi-square distribution with $h - p - q$ degrees of freedom, where $p$ and $q$ are the number of AR and MA parameters in your model
If $Q$ exceeds the critical value (or equivalently, the p-value is below your significance level), reject $H_0$ and conclude that significant autocorrelation remains

Note on degrees of freedom: Some textbooks use $h$ degrees of freedom directly, but when testing residuals from an ARMA(p,q) model, you should subtract the number of estimated parameters ( $h - p - q$ ) to account for the fact that the residuals aren't truly independent observations.

Other autocorrelation tests:

Durbin-Watson test: tests specifically for first-order (lag-1) autocorrelation; limited because it only checks one lag
Breusch-Godfrey test: more flexible than Durbin-Watson, can test for autocorrelation at multiple lags and works even when lagged dependent variables are present

If any of these tests detect significant autocorrelation, your model is misspecified. The typical remedy is adding AR or MA terms to capture the remaining dependence.

Assessment of Residual Normality

Many time series inference procedures (confidence intervals, prediction intervals, hypothesis tests on coefficients) assume the errors are normally distributed. Residual normality checks tell you whether those results are trustworthy.

Start with a visual check: a histogram of residuals and a Q-Q plot (quantile-quantile plot comparing residual quantiles to theoretical normal quantiles). If the Q-Q plot shows points roughly along a straight line, normality is reasonable.

Jarque-Bera Test

This test checks normality by looking at whether the residuals have the skewness and kurtosis you'd expect from a normal distribution (skewness = 0, kurtosis = 3).

Null hypothesis ( $H_0$ ): Residuals are normally distributed
Alternative hypothesis ( $H_a$ ): Residuals are not normally distributed

The test statistic is:

$JB = \frac{n}{6} \left( S^2 + \frac{(K-3)^2}{4} \right)$

where $n$ is the sample size, $S$ is the sample skewness, and $K$ is the sample kurtosis. Under $H_0$ , this follows a chi-square distribution with 2 degrees of freedom.

A large $JB$ value (small p-value) means the residuals deviate significantly from normality, either through asymmetry (skewness) or heavy/light tails (kurtosis).

Other normality tests include the Shapiro-Wilk test (generally more powerful for small samples) and the Anderson-Darling test.

If normality is violated, you have a few options:

Transform the data (e.g., log or Box-Cox transformation) before fitting the model, which often stabilizes variance and improves normality simultaneously
Use robust estimation methods that are less sensitive to non-normal errors
Use bootstrap or distribution-free methods for inference instead of relying on normal-theory confidence intervals

Keep in mind that mild departures from normality are usually not a serious problem, especially with larger samples, because many estimators are still consistent. Severe skewness or heavy tails are more concerning and worth addressing.

2,589 studying →