Fiveable

Intro to Time Series Unit 7 Review

QR code for Intro to Time Series practice questions

7.2 Residual analysis and diagnostic tests

7.2 Residual analysis and diagnostic tests

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
Intro to Time Series
Unit & Topic Study Guides

Residual Analysis and Diagnostic Tests

Residual analysis is how you check whether your time series model is actually doing its job. After fitting a model, the residuals (the differences between what you observed and what the model predicted) should look like random noise. If they don't, your model is missing something. This section covers the main plots and statistical tests you'll use to diagnose problems and decide how to fix them.

Importance of Residual Analysis

A well-specified time series model should extract all the systematic patterns from the data, leaving behind only unpredictable noise. Residual analysis tests whether that's actually happened.

  • Residuals are the differences between observed values and model-fitted values: et=yty^te_t = y_t - \hat{y}_t
  • If the model is good, residuals should behave like white noise: no patterns, no autocorrelation, roughly constant variance, and approximately normal distribution
  • Residual analysis also helps you spot outliers or influential observations that may be distorting your model's estimates

Think of it this way: if you can still find a pattern in the residuals, that pattern is information your model failed to capture, and you should go back and improve the model.

Importance of residual analysis, Checking Model Assumptions Using Graphs | Introduction to Statistics

Interpretation of Residual Plots

Residual plots are your first line of defense. Before running any formal tests, visually inspect these graphs:

  • Residuals vs. fitted values: checks for heteroscedasticity (non-constant variance) and non-linearity
    • Ideal: residuals scattered randomly around zero with no fan shape or curve
  • Residuals vs. time: checks for trends, seasonal patterns, or shifts in variance over time
    • Ideal: residuals fluctuate randomly around zero with no systematic drift
  • ACF and PACF of residuals: checks for leftover autocorrelation the model didn't capture
    • Ideal: nearly all spikes fall within the 95% confidence bands

When you spot a problem in these plots, it points toward a specific fix:

Pattern You SeeWhat It SuggestsPossible Fix
Fan or funnel shape in residuals vs. fittedNon-constant variance (heteroscedasticity)Variance-stabilizing transformation (e.g., log) or a GARCH-type model
Curved pattern in residuals vs. fittedNon-linearityAdd non-linear terms or apply a transformation
Trend or drift in residuals vs. timeNon-stationarity not fully addressedAdditional differencing or trend terms
Significant spikes in ACF/PACFResidual autocorrelationAdd AR or MA terms to the model
Importance of residual analysis, regression - Interpreting the residuals vs. fitted values plot for verifying the assumptions of ...

Diagnostic Tests for Autocorrelation

Visual inspection of ACF/PACF plots is useful but subjective. Formal tests give you a more rigorous answer about whether residual autocorrelation is present.

Ljung-Box Test

This is the most commonly used test for residual autocorrelation in time series. It checks multiple lags simultaneously rather than testing one lag at a time.

  • Null hypothesis (H0H_0): Residuals are independently distributed (no autocorrelation up to lag hh)
  • Alternative hypothesis (HaH_a): Residuals exhibit autocorrelation at one or more lags

The test statistic is:

Q=n(n+2)k=1hrk2nkQ = n(n+2) \sum_{k=1}^{h} \frac{r_k^2}{n-k}

where nn is the sample size, hh is the number of lags tested, and rkr_k is the sample autocorrelation of residuals at lag kk.

How to use it:

  1. Choose the number of lags hh to test (a common rule of thumb is h=10h = 10 for non-seasonal data, or h=2mh = 2m where mm is the seasonal period)

  2. Compute the QQ statistic from the residual autocorrelations

  3. Compare QQ to the critical value from a chi-square distribution with hpqh - p - q degrees of freedom, where pp and qq are the number of AR and MA parameters in your model

  4. If QQ exceeds the critical value (or equivalently, the p-value is below your significance level), reject H0H_0 and conclude that significant autocorrelation remains

Note on degrees of freedom: Some textbooks use hh degrees of freedom directly, but when testing residuals from an ARMA(p,q) model, you should subtract the number of estimated parameters (hpqh - p - q) to account for the fact that the residuals aren't truly independent observations.

Other autocorrelation tests:

  • Durbin-Watson test: tests specifically for first-order (lag-1) autocorrelation; limited because it only checks one lag
  • Breusch-Godfrey test: more flexible than Durbin-Watson, can test for autocorrelation at multiple lags and works even when lagged dependent variables are present

If any of these tests detect significant autocorrelation, your model is misspecified. The typical remedy is adding AR or MA terms to capture the remaining dependence.

Assessment of Residual Normality

Many time series inference procedures (confidence intervals, prediction intervals, hypothesis tests on coefficients) assume the errors are normally distributed. Residual normality checks tell you whether those results are trustworthy.

Start with a visual check: a histogram of residuals and a Q-Q plot (quantile-quantile plot comparing residual quantiles to theoretical normal quantiles). If the Q-Q plot shows points roughly along a straight line, normality is reasonable.

Jarque-Bera Test

This test checks normality by looking at whether the residuals have the skewness and kurtosis you'd expect from a normal distribution (skewness = 0, kurtosis = 3).

  • Null hypothesis (H0H_0): Residuals are normally distributed
  • Alternative hypothesis (HaH_a): Residuals are not normally distributed

The test statistic is:

JB=n6(S2+(K3)24)JB = \frac{n}{6} \left( S^2 + \frac{(K-3)^2}{4} \right)

where nn is the sample size, SS is the sample skewness, and KK is the sample kurtosis. Under H0H_0, this follows a chi-square distribution with 2 degrees of freedom.

A large JBJB value (small p-value) means the residuals deviate significantly from normality, either through asymmetry (skewness) or heavy/light tails (kurtosis).

Other normality tests include the Shapiro-Wilk test (generally more powerful for small samples) and the Anderson-Darling test.

If normality is violated, you have a few options:

  1. Transform the data (e.g., log or Box-Cox transformation) before fitting the model, which often stabilizes variance and improves normality simultaneously
  2. Use robust estimation methods that are less sensitive to non-normal errors
  3. Use bootstrap or distribution-free methods for inference instead of relying on normal-theory confidence intervals

Keep in mind that mild departures from normality are usually not a serious problem, especially with larger samples, because many estimators are still consistent. Severe skewness or heavy tails are more concerning and worth addressing.