upgrade
upgrade

Intro to Time Series

Stationarity Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Stationarity is the foundation of time series modeling—if you don't get this right, everything that follows falls apart. Most classical forecasting methods (ARIMA, exponential smoothing, and beyond) assume your data has stable statistical properties over time. When you're being tested on stationarity, you're really being tested on your ability to diagnose a time series before modeling it and justify your preprocessing decisions.

These tests aren't just checkbox exercises—they reveal the underlying structure of your data. You'll need to understand the difference between testing for a unit root versus testing for stationarity (they're not the same thing!), recognize when visual diagnostics complement formal tests, and know which test to reach for when your data has quirks like structural breaks or seasonality. Don't just memorize test names—know what null hypothesis each test uses and when to combine tests for a complete picture.


Unit Root Tests: Detecting Non-Stationarity

These tests ask: "Does this series have a unit root that makes it non-stationary?" The null hypothesis assumes non-stationarity, so you're looking for evidence against the null to conclude stationarity. A unit root means shocks to the series persist forever rather than dying out.

Augmented Dickey-Fuller (ADF) Test

  • Null hypothesis: unit root exists (non-stationary)—you reject the null when the test statistic is more negative than the critical value
  • Lagged difference terms are included to handle autocorrelation in residuals; choosing the right lag length matters for test validity
  • Most commonly used unit root test in econometrics and finance—expect this to appear on any exam covering stationarity

Phillips-Perron (PP) Test

  • Non-parametric correction for serial correlation and heteroskedasticity—no need to specify lag length like ADF
  • Same null hypothesis as ADF (unit root present), but uses a different approach to handle autocorrelation in errors
  • More robust than ADF when error terms have changing variance or complex dependence structures

Zivot-Andrews Test

  • Accounts for structural breaks—standard ADF/PP tests lose power when the data has a sudden shift in mean or trend
  • Endogenously determines the break date rather than requiring you to specify it in advance
  • Critical for economic data experiencing policy changes, crises, or regime shifts that would fool standard unit root tests

Compare: ADF vs. PP—both test the same null hypothesis (unit root), but PP handles autocorrelation non-parametrically while ADF adds lagged terms. Use PP when you suspect heteroskedasticity; use ADF when you want more control over lag specification.


Stationarity Tests: The Reverse Approach

Unlike unit root tests, these tests flip the null hypothesis—they assume stationarity and look for evidence against it. This reversal is critical for exam questions asking you to distinguish between test types.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

  • Null hypothesis: series IS stationary—the opposite of ADF, making it a perfect complement for confirmatory testing
  • Tests stationarity around a deterministic trend or level, depending on the specification you choose
  • Use alongside ADF to triangulate: if ADF rejects and KPSS fails to reject, you have strong evidence of stationarity

Compare: ADF vs. KPSS—they test opposite null hypotheses. When both agree (ADF rejects unit root, KPSS fails to reject stationarity), you're confident the series is stationary. When they conflict, you may have a near-unit-root process or need more data.


Diagnostic Tools: Visual and Residual Analysis

Before running formal tests, visual diagnostics help you understand your data's structure. After fitting models, these tools verify your assumptions hold. ACF and PACF plots are your first line of defense in time series analysis.

Autocorrelation Function (ACF) Plot

  • Shows correlation at each lag—values decaying slowly toward zero suggest non-stationarity or strong persistence
  • Seasonal patterns appear as spikes at regular intervals (e.g., lag 12 for monthly data with annual seasonality)
  • Confidence bands help identify statistically significant correlations; spikes outside bands indicate structure to model

Partial Autocorrelation Function (PACF) Plot

  • Controls for intermediate lags—shows the direct relationship between yty_t and ytky_{t-k} after removing effects of lags 1 through k1k-1
  • Sharp cutoff after lag p suggests an AR(p) model; this is your primary tool for identifying autoregressive order
  • Complements ACF for model identification: ACF helps identify MA order, PACF helps identify AR order

Ljung-Box Test

  • Tests for autocorrelation in residuals—null hypothesis is no autocorrelation at any lag up to the specified maximum
  • Q-statistic follows chi-squared distribution with degrees of freedom equal to number of lags minus number of estimated parameters
  • Apply to model residuals to check if your fitted model adequately captured the time series structure

Compare: ACF vs. PACF—ACF shows total correlation at each lag (including indirect effects), while PACF isolates direct effects. For AR model identification, watch where PACF cuts off; for MA identification, watch where ACF cuts off.


Specialized Tests: Random Walks and Seasonality

Some time series have specific structures that require targeted testing approaches. These tests address situations where standard unit root tests may be insufficient or inappropriate.

Variance Ratio Test

  • Tests random walk hypothesis—if a series is a random walk, variance should scale linearly with time horizon
  • Compares variance at different intervals: VR(k)=Var(ytytk)kVar(ytyt1)\text{VR}(k) = \frac{\text{Var}(y_t - y_{t-k})}{k \cdot \text{Var}(y_t - y_{t-1})}, which should equal 1 under random walk
  • Popular in finance for testing market efficiency—deviations from 1 suggest predictability in returns

Seasonal Unit Root Tests

  • Detect unit roots at seasonal frequencies—standard ADF only tests the zero frequency (long-run) unit root
  • HEGY test examines unit roots at multiple seasonal frequencies simultaneously; Canova-Hansen tests seasonal stationarity
  • Determines need for seasonal differencing versus regular differencing—critical for monthly, quarterly, or weekly data

Compare: Standard unit root tests vs. Seasonal unit root tests—ADF/PP detect non-stationarity in the trend, while HEGY/seasonal tests detect non-stationarity in periodic patterns. Monthly retail sales might be trend-stationary but have a seasonal unit root requiring seasonal differencing.


Quick Reference Table

ConceptBest Examples
Unit root detection (null: non-stationary)ADF, PP, Zivot-Andrews
Stationarity testing (null: stationary)KPSS
Structural break accommodationZivot-Andrews
Visual autocorrelation diagnosisACF plot, PACF plot
Model adequacy / residual checkingLjung-Box test
Random walk hypothesisVariance ratio test
Seasonal non-stationarityHEGY test, Canova-Hansen test
AR order identificationPACF plot

Self-Check Questions

  1. You run an ADF test and get a p-value of 0.03, then run a KPSS test and get a p-value of 0.15. What do these results together tell you about stationarity, and why is using both tests more informative than using just one?

  2. Which two tests share the same null hypothesis (unit root present) but handle autocorrelation differently? When would you prefer one over the other?

  3. Your ACF plot shows slow decay over many lags while your PACF shows a sharp cutoff after lag 2. What does this pattern suggest about (a) stationarity and (b) potential model structure?

  4. Compare and contrast how you would test for stationarity in a quarterly GDP series that experienced a major policy change mid-sample versus a series with no obvious structural breaks.

  5. A colleague claims their residuals are fine because the ACF plot looks clean. What formal test should they run to support this claim, and what null hypothesis would they be testing?