Stationarity
Stationarity describes a stochastic process whose statistical properties remain stable over time. If you grab a chunk of data from the beginning of a stationary process and another chunk from much later, they should look statistically the same. This property underpins most of the models you'll encounter in time series analysis, signal processing, and econometrics.
Strict Stationarity
Strict (or strong) stationarity is the most demanding form. It requires that the entire joint probability distribution of any collection of time points is invariant to shifts in time. Formally, for any times and any shift , the joint distribution of must equal that of .
This means every statistical property (mean, variance, skewness, kurtosis, all higher-order moments) stays constant over time. In practice, strict stationarity is very hard to verify because you'd need to check the full distribution, not just a few summary statistics.
Weak (Wide-Sense / Covariance) Stationarity
Weak stationarity relaxes the requirement to just the first two moments. A process is weakly stationary if:
- The mean is constant: for all
- The variance is finite and constant: for all
- The autocovariance depends only on the lag: , not on itself
You'll see this called wide-sense stationarity or covariance stationarity in different textbooks. They all mean the same thing. This is the version used most often in practice because you can check it with sample means and sample autocovariances.
A strictly stationary process with finite second moments is always weakly stationary. The reverse isn't true in general, though it is true for Gaussian processes (since a Gaussian distribution is fully determined by its mean and covariance).
Stationarity vs. Non-Stationarity
Non-stationary processes have statistical properties that drift over time. Common sources of non-stationarity include:
- Trends: a mean that increases or decreases (linearly or otherwise)
- Seasonality: periodic fluctuations tied to calendar effects or cycles
- Time-varying volatility (heteroscedasticity): variance that changes, as seen in financial return data
Correctly identifying non-stationarity matters because fitting a stationary model (like AR or MA) to non-stationary data produces unreliable estimates and can lead to spurious regression, where unrelated series appear correlated simply because they share a common trend.
Ergodicity
Ergodicity connects what you can observe (a single long time series) to what you want to know (the true statistical properties of the process). An ergodic process is one where the time average computed from a single realization converges to the ensemble average (the expected value across all possible realizations) as the observation period grows.
This is practically important: you almost never have access to many independent realizations of the same process. Ergodicity is what justifies computing a sample mean from one long recording and treating it as an estimate of the true mean.
Ergodic Processes
For a process to be ergodic, the time average of any measurable function must converge almost surely to its expected value:
Examples of ergodic processes include stationary Gaussian processes with autocovariance that decays to zero, irreducible and aperiodic stationary Markov chains, and many stationary point processes.
The Ergodic Theorem
The ergodic theorem provides the formal guarantee: for an ergodic process, time averages converge almost surely to ensemble averages as . This theorem is the theoretical backbone for using sample statistics (sample mean, sample autocovariance) as consistent estimators of true process parameters.
Relationship Between Stationarity and Ergodicity
Every ergodic process is stationary, but not every stationary process is ergodic. Here's a classic counterexample: suppose you flip a fair coin once, and if heads, for all ; if tails, for all . This process is stationary (the distribution of is the same for every ), but it's not ergodic. The time average of any single realization is either or , never the ensemble mean of .
For a stationary process to be ergodic, it needs a mixing condition: observations far apart in time must become approximately independent. Intuitively, the process must "forget" its past. If the autocovariance as , that's a good sign (though not always sufficient on its own).
Ergodicity in Parameter Estimation
Ergodicity is a key assumption behind standard estimation methods for time series:
- The sample mean converges to the true mean
- The sample autocovariance converges to the true autocovariance
- Methods like maximum likelihood estimation and method of moments produce consistent estimates
Without ergodicity, a single realization might not be representative of the process, and these estimators could converge to the wrong values.
Time Averages vs. Ensemble Averages
These two types of averages capture different perspectives on a stochastic process, and understanding when they agree is central to applied work.

Time Average
The time average measures the average behavior along a single realization over time. For a process and a function :
In discrete time, this becomes a simple sample mean: .
Ensemble Average
The ensemble average measures the average across all possible realizations at a fixed time :
If you could run the same experiment thousands of times and average the results at time , that's the ensemble average.
Equality of Averages for Ergodic Processes
For ergodic processes, these two quantities converge:
This equality is what makes single-realization analysis possible. In fields like signal processing or finance, you typically have one long recording or one price history. Ergodicity is the property that makes it valid to extract statistical information from that single trajectory.
Autocorrelation and Autocovariance
These functions quantify how a process relates to its own past. They're the primary tools for describing temporal dependence and are directly tied to the definition of weak stationarity.
Autocovariance Function
The autocovariance function (ACVF) measures the covariance between and . For a stationary process with mean :
Note that (the variance). The ACVF is symmetric: .
Autocorrelation Function
The autocorrelation function (ACF) is the normalized version of the ACVF:
The ACF always satisfies and . It tells you the strength of linear dependence at lag , stripped of scale. A slowly decaying ACF suggests strong persistence (and possibly non-stationarity), while a quickly decaying ACF suggests short memory.
Stationarity and Autocorrelation
For a weakly stationary process, both and are functions of the lag alone. They don't depend on when you start measuring. This is actually part of the definition of weak stationarity, so if you find that the sample ACF looks different depending on which portion of the data you use, that's evidence of non-stationarity.
Ergodicity and Autocorrelation
For ergodic processes, the sample ACF and sample ACVF computed from a single realization converge to their true values as the realization length grows. This convergence is what allows you to estimate and from data and use them to fit models like AR and MA processes.
Stationarity Tests
Before fitting a stationary model, you need to check whether your data actually is stationary. Several formal tests exist, and it's good practice to use more than one since they have complementary null hypotheses.

Visual Inspection
Plotting the time series is always a useful first step. Look for:
- An upward or downward drift in the level (trend)
- Repeating patterns at regular intervals (seasonality)
- Periods where the spread of the data visibly changes (heteroscedasticity)
Visual inspection is quick and informative, but it's subjective. Always follow up with a formal test.
Augmented Dickey-Fuller (ADF) Test
The ADF test checks for a unit root, which is a specific form of non-stationarity. It estimates the regression:
- Null hypothesis: the series has a unit root (non-stationary)
- Alternative hypothesis: the series is stationary
You reject the null (conclude stationarity) if the test statistic is sufficiently negative. The ADF test uses its own critical values, not the standard t-distribution. Be aware that results can be sensitive to the choice of lag length and whether you include the constant and trend .
KPSS Test
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test flips the hypotheses:
- Null hypothesis: the series is stationary
- Alternative hypothesis: the series has a unit root
The test statistic is based on the cumulative sum of residuals from a regression on deterministic terms, normalized by a long-run variance estimate. You reject stationarity if the test statistic is large.
Using the ADF and KPSS tests together gives you a more reliable picture. If ADF rejects its null and KPSS fails to reject its null, you have strong evidence of stationarity. If they contradict each other, the situation is ambiguous and you may need further investigation (e.g., differencing or examining structural breaks).
Phillips-Perron (PP) Test
The Phillips-Perron test shares the same null and alternative hypotheses as the ADF test (unit root vs. stationarity) but handles serial correlation differently. Instead of adding lagged difference terms to the regression, it applies a non-parametric correction to the test statistic to account for serial correlation and heteroscedasticity in the errors.
The PP test is less sensitive to lag length selection than the ADF test, but it can have lower power (higher chance of missing true stationarity) in finite samples.
Applications of Stationarity and Ergodicity
Time Series Analysis
Stationarity is a core assumption in models like AR, MA, and ARIMA. If a series is non-stationary, you typically difference it (subtract consecutive values) or detrend it to achieve stationarity before fitting a model. Ergodicity ensures that the parameter estimates from your single observed series are consistent.
Signal Processing
Techniques like Fourier analysis and power spectral density estimation assume the signal is stationary. The power spectrum of a stationary process is the Fourier transform of its autocovariance function (the Wiener-Khinchin theorem). Ergodicity allows you to estimate spectral properties from a single recorded signal.
Markov Chains
A Markov chain is stationary if its transition probability matrix doesn't change over time and it has reached its stationary distribution . An ergodic Markov chain (irreducible and aperiodic, with a positive recurrent state space) converges to regardless of the starting state. This is what makes long-run simulation-based estimates (like those in MCMC methods) valid.
Queueing Theory
Stationary queueing models (e.g., M/M/1, M/M/c) assume that arrival and service rates are constant over time. Ergodicity in a queue means the system reaches a steady state where performance measures like average waiting time and server utilization stabilize. For an M/M/1 queue, ergodicity requires that the traffic intensity (arrival rate less than service rate); otherwise, the queue grows without bound and no steady state exists.