โณIntro to Time Series

Stationarity Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Stationarity is the foundation of time series modeling. If you don't get this right, everything that follows falls apart. Most classical forecasting methods (ARIMA, exponential smoothing, and beyond) assume your data has stable statistical properties over time. When you're tested on stationarity, you're really being tested on your ability to diagnose a time series before modeling it and justify your preprocessing decisions.

These tests reveal the underlying structure of your data. You'll need to understand the difference between testing for a unit root versus testing for stationarity (they're not the same thing), recognize when visual diagnostics complement formal tests, and know which test to reach for when your data has quirks like structural breaks or seasonality. Don't just memorize test names. Know what null hypothesis each test uses and when to combine tests for a complete picture.


Unit Root Tests: Detecting Non-Stationarity

These tests ask: "Does this series have a unit root that makes it non-stationary?" The null hypothesis assumes non-stationarity, so you're looking for evidence against the null to conclude stationarity. A unit root means shocks to the series persist forever rather than dying out.

Augmented Dickey-Fuller (ADF) Test

The ADF test is the most commonly used unit root test in econometrics and finance. Expect it on any exam covering stationarity.

  • Null hypothesis: unit root exists (non-stationary). You reject the null when the test statistic is more negative than the critical value.
  • Lagged difference terms are included to absorb autocorrelation in the residuals. Choosing the right number of lags matters: too few and leftover autocorrelation biases the test, too many and you lose statistical power.
  • A common approach is to select lag length using an information criterion like AIC or BIC.

Phillips-Perron (PP) Test

The PP test shares the same null hypothesis as ADF (unit root present) but takes a different approach to handling autocorrelation in the errors.

  • Instead of adding lagged difference terms, PP applies a non-parametric correction to the test statistic to account for serial correlation and heteroskedasticity.
  • This means you don't need to specify a lag length, which removes one source of user error.
  • PP tends to be more robust than ADF when error terms have changing variance or complex dependence structures.

Zivot-Andrews Test

Standard ADF and PP tests can lose power badly when the data contains a sudden shift in mean or trend. The Zivot-Andrews test addresses this directly.

  • It accounts for a structural break by endogenously determining the break date rather than requiring you to specify it in advance.
  • The test searches over possible break points and selects the one that gives the strongest evidence against the unit root null.
  • This is critical for economic data affected by policy changes, financial crises, or regime shifts that would fool standard unit root tests.

Compare: ADF vs. PP: both test the same null hypothesis (unit root), but PP handles autocorrelation non-parametrically while ADF adds lagged terms. Use PP when you suspect heteroskedasticity; use ADF when you want more control over lag specification.


Stationarity Tests: The Reverse Approach

Unlike unit root tests, these tests flip the null hypothesis. They assume stationarity and look for evidence against it. This reversal is critical for exam questions asking you to distinguish between test types.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

The KPSS test is the natural complement to the ADF test because it approaches the question from the opposite direction.

  • Null hypothesis: the series IS stationary. This is the opposite of ADF.
  • You choose whether to test stationarity around a deterministic trend or around a level (constant mean), depending on whether your series appears to have a trend component.
  • Use KPSS alongside ADF to triangulate your conclusion. If ADF rejects its null (unit root) and KPSS fails to reject its null (stationarity), you have strong evidence the series is stationary. When the two tests conflict, you may have a near-unit-root process or simply need more data.

Compare: ADF vs. KPSS: they test opposite null hypotheses. When both agree (ADF rejects unit root, KPSS fails to reject stationarity), you can be confident the series is stationary. When they conflict, the situation is ambiguous and warrants further investigation.


Diagnostic Tools: Visual and Residual Analysis

Before running formal tests, visual diagnostics help you understand your data's structure. After fitting models, these same tools verify that your assumptions hold. ACF and PACF plots are your first line of defense in time series analysis.

Autocorrelation Function (ACF) Plot

The ACF plot shows the correlation between a series and its own lagged values at each lag.

  • Values decaying slowly toward zero suggest non-stationarity or strong persistence in the data.
  • Seasonal patterns appear as spikes at regular intervals (e.g., spikes at lags 12, 24, 36 for monthly data with annual seasonality).
  • Confidence bands (typically at the 95% level) help identify statistically significant correlations. Spikes outside these bands indicate structure that your model should capture.

Partial Autocorrelation Function (PACF) Plot

The PACF isolates the direct relationship between yty_t and ytโˆ’ky_{t-k} after removing the effects of all intermediate lags (1 through kโˆ’1k-1).

  • A sharp cutoff after lag p suggests an AR(p) model. This is your primary tool for identifying autoregressive order.
  • The PACF complements the ACF for model identification: ACF helps identify MA order (where ACF cuts off), PACF helps identify AR order (where PACF cuts off).

Ljung-Box Test

After fitting a model, you need to check whether the residuals still contain autocorrelation. The Ljung-Box test does exactly this.

  • Null hypothesis: no autocorrelation at any lag up to the specified maximum.
  • The Q-statistic follows a chi-squared distribution with degrees of freedom equal to the number of lags tested minus the number of estimated model parameters.
  • If you reject the null, your model hasn't adequately captured the time series structure and needs refinement.

Compare: ACF vs. PACF: ACF shows total correlation at each lag (including indirect effects through intermediate lags), while PACF isolates direct effects only. For AR model identification, watch where PACF cuts off. For MA identification, watch where ACF cuts off.


Specialized Tests: Random Walks and Seasonality

Some time series have specific structures that require targeted testing. Standard unit root tests may be insufficient or inappropriate in these cases.

Variance Ratio Test

The variance ratio test directly targets the random walk hypothesis. If a series is a true random walk, its variance should scale linearly with the time horizon.

  • The test compares variance at different intervals: VR(k)=Var(ytโˆ’ytโˆ’k)kโ‹…Var(ytโˆ’ytโˆ’1)\text{VR}(k) = \frac{\text{Var}(y_t - y_{t-k})}{k \cdot \text{Var}(y_t - y_{t-1})}
  • Under a random walk, this ratio should equal 1. Values significantly different from 1 suggest the series is not a random walk.
  • This test is popular in finance for testing market efficiency. A ratio above 1 suggests positive autocorrelation (momentum), while a ratio below 1 suggests mean reversion.

Seasonal Unit Root Tests

Standard ADF only tests the zero-frequency (long-run) unit root. But a series can also have unit roots at seasonal frequencies, which require separate detection.

  • The HEGY test examines unit roots at multiple seasonal frequencies simultaneously, telling you exactly which frequencies are non-stationary.
  • The Canova-Hansen test flips the null (like KPSS does for regular stationarity) and tests whether seasonal patterns are stable.
  • These tests determine whether you need seasonal differencing in addition to (or instead of) regular differencing. For example, monthly retail sales might be trend-stationary but still have a seasonal unit root requiring seasonal differencing.

Compare: Standard unit root tests vs. Seasonal unit root tests: ADF/PP detect non-stationarity in the trend, while HEGY and seasonal tests detect non-stationarity in periodic patterns. You may need both types of differencing for a single series.


Quick Reference Table

ConceptBest Examples
Unit root detection (null: non-stationary)ADF, PP, Zivot-Andrews
Stationarity testing (null: stationary)KPSS
Structural break accommodationZivot-Andrews
Visual autocorrelation diagnosisACF plot, PACF plot
Model adequacy / residual checkingLjung-Box test
Random walk hypothesisVariance ratio test
Seasonal non-stationarityHEGY test, Canova-Hansen test
AR order identificationPACF plot

Self-Check Questions

  1. You run an ADF test and get a p-value of 0.03, then run a KPSS test and get a p-value of 0.15. What do these results together tell you about stationarity, and why is using both tests more informative than using just one?

  2. Which two tests share the same null hypothesis (unit root present) but handle autocorrelation differently? When would you prefer one over the other?

  3. Your ACF plot shows slow decay over many lags while your PACF shows a sharp cutoff after lag 2. What does this pattern suggest about (a) stationarity and (b) potential model structure?

  4. Compare and contrast how you would test for stationarity in a quarterly GDP series that experienced a major policy change mid-sample versus a series with no obvious structural breaks.

  5. A colleague claims their residuals are fine because the ACF plot looks clean. What formal test should they run to support this claim, and what null hypothesis would they be testing?