Stationarity describes a stochastic process whose statistical properties remain stable over time. If you grab a chunk of data from the beginning of a stationary process and another chunk from much later, they should look statistically the same. This property underpins most of the models you'll encounter in time series analysis, signal processing, and econometrics.

Strict Stationarity

Strict (or strong) stationarity is the most demanding form. It requires that the entire joint probability distribution of any collection of time points is invariant to shifts in time. Formally, for any times $t_1, t_2, \ldots, t_n$ and any shift $h$ , the joint distribution of $(X(t_1), X(t_2), \ldots, X(t_n))$ must equal that of $(X(t_1+h), X(t_2+h), \ldots, X(t_n+h))$ .

This means every statistical property (mean, variance, skewness, kurtosis, all higher-order moments) stays constant over time. In practice, strict stationarity is very hard to verify because you'd need to check the full distribution, not just a few summary statistics.

Weak (Wide-Sense / Covariance) Stationarity

Weak stationarity relaxes the requirement to just the first two moments. A process $X(t)$ is weakly stationary if:

The mean is constant: $\mathbb{E}[X(t)] = \mu$ for all $t$
The variance is finite and constant: $\text{Var}(X(t)) = \sigma^2 < \infty$ for all $t$
The autocovariance depends only on the lag: $\text{Cov}(X(t), X(t+\tau)) = \gamma(\tau)$ , not on $t$ itself

You'll see this called wide-sense stationarity or covariance stationarity in different textbooks. They all mean the same thing. This is the version used most often in practice because you can check it with sample means and sample autocovariances.

A strictly stationary process with finite second moments is always weakly stationary. The reverse isn't true in general, though it is true for Gaussian processes (since a Gaussian distribution is fully determined by its mean and covariance).

Stationarity vs. Non-Stationarity

Non-stationary processes have statistical properties that drift over time. Common sources of non-stationarity include:

Trends: a mean that increases or decreases (linearly or otherwise)
Seasonality: periodic fluctuations tied to calendar effects or cycles
Time-varying volatility (heteroscedasticity): variance that changes, as seen in financial return data

Correctly identifying non-stationarity matters because fitting a stationary model (like AR or MA) to non-stationary data produces unreliable estimates and can lead to spurious regression, where unrelated series appear correlated simply because they share a common trend.

Ergodicity

Ergodicity connects what you can observe (a single long time series) to what you want to know (the true statistical properties of the process). An ergodic process is one where the time average computed from a single realization converges to the ensemble average (the expected value across all possible realizations) as the observation period grows.

This is practically important: you almost never have access to many independent realizations of the same process. Ergodicity is what justifies computing a sample mean from one long recording and treating it as an estimate of the true mean.

Ergodic Processes

For a process to be ergodic, the time average of any measurable function $f(X(t))$ must converge almost surely to its expected value:

$\lim_{T \to \infty} \frac{1}{T} \int_0^T f(X(t))\, dt = \mathbb{E}[f(X(t))]$

Examples of ergodic processes include stationary Gaussian processes with autocovariance that decays to zero, irreducible and aperiodic stationary Markov chains, and many stationary point processes.

The Ergodic Theorem

The ergodic theorem provides the formal guarantee: for an ergodic process, time averages converge almost surely to ensemble averages as $T \to \infty$ . This theorem is the theoretical backbone for using sample statistics (sample mean, sample autocovariance) as consistent estimators of true process parameters.

Relationship Between Stationarity and Ergodicity

Every ergodic process is stationary, but not every stationary process is ergodic. Here's a classic counterexample: suppose you flip a fair coin once, and if heads, $X(t) = +1$ for all $t$ ; if tails, $X(t) = -1$ for all $t$ . This process is stationary (the distribution of $X(t)$ is the same for every $t$ ), but it's not ergodic. The time average of any single realization is either $+1$ or $-1$ , never the ensemble mean of $0$ .

For a stationary process to be ergodic, it needs a mixing condition: observations far apart in time must become approximately independent. Intuitively, the process must "forget" its past. If the autocovariance $\gamma(\tau) \to 0$ as $\tau \to \infty$ , that's a good sign (though not always sufficient on its own).

Ergodicity in Parameter Estimation

Ergodicity is a key assumption behind standard estimation methods for time series:

The sample mean converges to the true mean $\mu$
The sample autocovariance converges to the true autocovariance $\gamma(\tau)$
Methods like maximum likelihood estimation and method of moments produce consistent estimates

Without ergodicity, a single realization might not be representative of the process, and these estimators could converge to the wrong values.

Time Averages vs. Ensemble Averages

These two types of averages capture different perspectives on a stochastic process, and understanding when they agree is central to applied work.

Strict stationarity, Dynamic Stochastic General Equilibrium models made (relatively) easy with R

Time Average

The time average measures the average behavior along a single realization over time. For a process $X(t)$ and a function $f$ :

$\bar{f}_T = \frac{1}{T} \int_0^T f(X(t))\, dt$

In discrete time, this becomes a simple sample mean: $\bar{f}_N = \frac{1}{N} \sum_{n=1}^N f(X_n)$ .

Ensemble Average

The ensemble average measures the average across all possible realizations at a fixed time $t$ :

$\langle f(X(t)) \rangle = \mathbb{E}[f(X(t))]$

If you could run the same experiment thousands of times and average the results at time $t$ , that's the ensemble average.

Equality of Averages for Ergodic Processes

For ergodic processes, these two quantities converge:

$\lim_{T \to \infty} \frac{1}{T} \int_0^T f(X(t))\, dt = \mathbb{E}[f(X(t))]$

This equality is what makes single-realization analysis possible. In fields like signal processing or finance, you typically have one long recording or one price history. Ergodicity is the property that makes it valid to extract statistical information from that single trajectory.

Autocorrelation and Autocovariance

These functions quantify how a process relates to its own past. They're the primary tools for describing temporal dependence and are directly tied to the definition of weak stationarity.

Autocovariance Function

The autocovariance function (ACVF) measures the covariance between $X(t)$ and $X(t+\tau)$ . For a stationary process with mean $\mu$ :

$\gamma(\tau) = \mathbb{E}[(X(t) - \mu)(X(t+\tau) - \mu)]$

Note that $\gamma(0) = \sigma^2$ (the variance). The ACVF is symmetric: $\gamma(\tau) = \gamma(-\tau)$ .

Autocorrelation Function

The autocorrelation function (ACF) is the normalized version of the ACVF:

$\rho(\tau) = \frac{\gamma(\tau)}{\gamma(0)} = \frac{\gamma(\tau)}{\sigma^2}$

The ACF always satisfies $\rho(0) = 1$ and $-1 \leq \rho(\tau) \leq 1$ . It tells you the strength of linear dependence at lag $\tau$ , stripped of scale. A slowly decaying ACF suggests strong persistence (and possibly non-stationarity), while a quickly decaying ACF suggests short memory.

Stationarity and Autocorrelation

For a weakly stationary process, both $\gamma(\tau)$ and $\rho(\tau)$ are functions of the lag $\tau$ alone. They don't depend on when you start measuring. This is actually part of the definition of weak stationarity, so if you find that the sample ACF looks different depending on which portion of the data you use, that's evidence of non-stationarity.

Ergodicity and Autocorrelation

For ergodic processes, the sample ACF and sample ACVF computed from a single realization converge to their true values as the realization length grows. This convergence is what allows you to estimate $\gamma(\tau)$ and $\rho(\tau)$ from data and use them to fit models like AR and MA processes.

Stationarity Tests

Before fitting a stationary model, you need to check whether your data actually is stationary. Several formal tests exist, and it's good practice to use more than one since they have complementary null hypotheses.

Strict stationarity, Stationair proces - Wikipedia

Visual Inspection

Plotting the time series is always a useful first step. Look for:

An upward or downward drift in the level (trend)
Repeating patterns at regular intervals (seasonality)
Periods where the spread of the data visibly changes (heteroscedasticity)

Visual inspection is quick and informative, but it's subjective. Always follow up with a formal test.

Augmented Dickey-Fuller (ADF) Test

The ADF test checks for a unit root, which is a specific form of non-stationarity. It estimates the regression:

$\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \varepsilon_t$

Null hypothesis: the series has a unit root (non-stationary)
Alternative hypothesis: the series is stationary

You reject the null (conclude stationarity) if the test statistic is sufficiently negative. The ADF test uses its own critical values, not the standard t-distribution. Be aware that results can be sensitive to the choice of lag length $p$ and whether you include the constant $\alpha$ and trend $\beta t$ .

KPSS Test

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test flips the hypotheses:

Null hypothesis: the series is stationary
Alternative hypothesis: the series has a unit root

The test statistic is based on the cumulative sum of residuals from a regression on deterministic terms, normalized by a long-run variance estimate. You reject stationarity if the test statistic is large.

Using the ADF and KPSS tests together gives you a more reliable picture. If ADF rejects its null and KPSS fails to reject its null, you have strong evidence of stationarity. If they contradict each other, the situation is ambiguous and you may need further investigation (e.g., differencing or examining structural breaks).

Phillips-Perron (PP) Test

The Phillips-Perron test shares the same null and alternative hypotheses as the ADF test (unit root vs. stationarity) but handles serial correlation differently. Instead of adding lagged difference terms to the regression, it applies a non-parametric correction to the test statistic to account for serial correlation and heteroscedasticity in the errors.

The PP test is less sensitive to lag length selection than the ADF test, but it can have lower power (higher chance of missing true stationarity) in finite samples.

Applications of Stationarity and Ergodicity

Time Series Analysis

Stationarity is a core assumption in models like AR, MA, and ARIMA. If a series is non-stationary, you typically difference it (subtract consecutive values) or detrend it to achieve stationarity before fitting a model. Ergodicity ensures that the parameter estimates from your single observed series are consistent.

Signal Processing

Techniques like Fourier analysis and power spectral density estimation assume the signal is stationary. The power spectrum of a stationary process is the Fourier transform of its autocovariance function (the Wiener-Khinchin theorem). Ergodicity allows you to estimate spectral properties from a single recorded signal.

Markov Chains

A Markov chain is stationary if its transition probability matrix doesn't change over time and it has reached its stationary distribution $\pi$ . An ergodic Markov chain (irreducible and aperiodic, with a positive recurrent state space) converges to $\pi$ regardless of the starting state. This is what makes long-run simulation-based estimates (like those in MCMC methods) valid.

Queueing Theory

Stationary queueing models (e.g., M/M/1, M/M/c) assume that arrival and service rates are constant over time. Ergodicity in a queue means the system reaches a steady state where performance measures like average waiting time and server utilization stabilize. For an M/M/1 queue, ergodicity requires that the traffic intensity $\rho = \lambda / \mu < 1$ (arrival rate less than service rate); otherwise, the queue grows without bound and no steady state exists.