Fiveable

🔀Stochastic Processes Unit 3 Review

QR code for Stochastic Processes practice questions

3.3 Autocorrelation and autocovariance

3.3 Autocorrelation and autocovariance

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Autocorrelation and autocovariance measure how a stochastic process relates to itself over time. They quantify the dependence structure at different time lags, which is essential for identifying patterns, selecting models, and forecasting future behavior in fields like finance, signal processing, and econometrics.

Definition of autocorrelation

Autocorrelation measures the correlation between a time series and a lagged version of itself. If today's value tends to be similar to yesterday's, the lag-1 autocorrelation will be high. This makes autocorrelation the primary tool for characterizing how "memory" works within a single process.

Autocorrelation vs cross-correlation

Cross-correlation measures the correlation between two different time series, while autocorrelation is the special case where both series are the same process, just shifted by a time lag. You can think of autocorrelation as "self-correlation." Cross-correlation helps you find relationships between processes; autocorrelation reveals structure within one process.

Mathematical formulation

For a stationary process XtX_t with mean μ\mu, the autocorrelation at lag kk is:

ρ(k)=Cov(Xt,Xt+k)Var(Xt)Var(Xt+k)\rho(k) = \frac{\text{Cov}(X_t, X_{t+k})}{\sqrt{\text{Var}(X_t)} \cdot \sqrt{\text{Var}(X_{t+k})}}

Because stationarity guarantees that Var(Xt)=Var(Xt+k)\text{Var}(X_t) = \text{Var}(X_{t+k}), this simplifies to:

ρ(k)=Cov(Xt,Xt+k)Var(Xt)=γ(k)γ(0)\rho(k) = \frac{\text{Cov}(X_t, X_{t+k})}{\text{Var}(X_t)} = \frac{\gamma(k)}{\gamma(0)}

The numerator is the autocovariance at lag kk, and the denominator is just the variance (autocovariance at lag 0). The simplification only works because stationarity makes the variance constant over time.

Interpretation of autocorrelation values

Autocorrelation values are bounded between 1-1 and 11:

  • ρ(k)=1\rho(k) = 1: Perfect positive linear relationship with the lagged version. The process at time tt moves in lockstep with the process at time t+kt+k.
  • ρ(k)=1\rho(k) = -1: Perfect negative linear relationship. High values at time tt correspond to low values at time t+kt+k, and vice versa.
  • ρ(k)=0\rho(k) = 0: No linear dependence between the process and its lag-kk version. (Note: nonlinear dependence could still exist.)

The sign tells you the direction, and the magnitude tells you the strength.

Autocorrelation function (ACF)

The ACF is the function ρ(k)\rho(k) plotted against the lag kk. It gives you a complete picture of the linear dependence structure across all lags at once. In practice, you'll see it displayed as a bar chart (correlogram) with one bar per lag.

ACF for stationary processes

For a stationary process, the ACF has three important characteristics:

  • It depends only on the lag kk, not on the absolute time tt
  • It is symmetric: ρ(k)=ρ(k)\rho(k) = \rho(-k)
  • It typically decays toward zero as kk increases, reflecting the fact that most stationary processes have finite memory (distant observations become less correlated)

Sample ACF

In practice, you don't know the true ACF and must estimate it from data. For a time series {X1,X2,,Xn}\{X_1, X_2, \ldots, X_n\} with sample mean Xˉ\bar{X}, the sample autocorrelation at lag kk is:

ρ^(k)=t=1nk(XtXˉ)(Xt+kXˉ)t=1n(XtXˉ)2\hat{\rho}(k) = \frac{\sum_{t=1}^{n-k}(X_t - \bar{X})(X_{t+k} - \bar{X})}{\sum_{t=1}^{n}(X_t - \bar{X})^2}

Notice the denominator sums over all nn observations (not just nkn-k). This ensures ρ^(0)=1\hat{\rho}(0) = 1 and that the sample autocovariance matrix remains positive semi-definite.

Confidence intervals for ACF

To decide whether an observed autocorrelation is statistically significant, you compare it against a confidence band. Under the null hypothesis that the process is white noise, the sample autocorrelations are approximately normally distributed:

ρ^(k)N(0,1n)\hat{\rho}(k) \sim N\left(0, \frac{1}{n}\right)

So an approximate 95% confidence interval is:

±1.96n\pm \frac{1.96}{\sqrt{n}}

Any sample autocorrelation falling outside this band is considered significant at the 5% level. Most statistical software draws these bands automatically on ACF plots.

ACF for non-stationary processes

When a process is non-stationary (e.g., it has a trend or changing variance), the ACF can be misleading. A trending series will show high autocorrelation at many lags that decays very slowly, which reflects the trend rather than genuine short-range dependence. The standard approach is to difference the series or apply another transformation to achieve stationarity before interpreting the ACF.

Properties of autocorrelation

Symmetry of autocorrelation

ρ(k)=ρ(k)\rho(k) = \rho(-k)

This follows directly from the symmetry of covariance: Cov(Xt,Xt+k)=Cov(Xt+k,Xt)\text{Cov}(X_t, X_{t+k}) = \text{Cov}(X_{t+k}, X_t). In practice, this means you only need to look at non-negative lags.

Bounds on autocorrelation

1ρ(k)1-1 \leq \rho(k) \leq 1

This is a consequence of the Cauchy-Schwarz inequality applied to the covariance. Additionally, ρ(0)=1\rho(0) = 1 always, since any variable is perfectly correlated with itself.

Relationship to spectral density

The autocorrelation function and the power spectral density are a Fourier transform pair. For a discrete-time stationary process:

f(ω)=k=ρ(k)eiωkf(\omega) = \sum_{k=-\infty}^{\infty} \rho(k) e^{-i\omega k}

This is the Wiener-Khinchin relation. It means that any dependence structure you see in the time domain (via the ACF) has an equivalent representation in the frequency domain (via the spectral density). Peaks in the spectral density correspond to periodic components, while the ACF captures the same information as correlations at specific lags.

Autocorrelation vs cross-correlation, time series - How to interpret error autocorrelation? - Cross Validated

Autocovariance

Autocovariance is the unnormalized version of autocorrelation. It measures the covariance between a process and its own lagged values, retaining the original units of the data (squared units of XtX_t).

Definition of autocovariance

For a stationary process XtX_t with mean μ\mu:

γ(k)=Cov(Xt,Xt+k)=E[(Xtμ)(Xt+kμ)]\gamma(k) = \text{Cov}(X_t, X_{t+k}) = \mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)]

Stationarity ensures that γ(k)\gamma(k) depends only on the lag kk, not on tt.

Autocovariance vs autocorrelation

The relationship between them is straightforward:

ρ(k)=γ(k)γ(0)\rho(k) = \frac{\gamma(k)}{\gamma(0)}

  • Autocovariance γ(k)\gamma(k) has units (same as the variance) and is unbounded above by zero. It tells you the scale of the dependence.
  • Autocorrelation ρ(k)\rho(k) is dimensionless and bounded between 1-1 and 11. It tells you the strength of the dependence, normalized for scale.

Use autocorrelation when comparing dependence across different processes or different scales. Use autocovariance when you need the actual magnitude (e.g., in model parameter estimation).

Autocovariance function (ACVF)

The ACVF plots γ(k)\gamma(k) against lag kk. It carries the same structural information as the ACF but is not normalized, so the values at different lags reflect both the strength of dependence and the overall variability of the process.

Properties of autocovariance

  • Symmetry: γ(k)=γ(k)\gamma(k) = \gamma(-k)
  • Variance at lag 0: γ(0)=Var(Xt)\gamma(0) = \text{Var}(X_t)
  • Positive semi-definiteness: For any set of times and constants, the autocovariance matrix must be positive semi-definite. This is a necessary condition for γ\gamma to be a valid autocovariance function.
  • For a stationary process, γ(k)\gamma(k) depends only on the lag, not on absolute time.

Estimating autocorrelation and autocovariance

In practice, the true ρ(k)\rho(k) and γ(k)\gamma(k) are unknown. You estimate them from observed data.

Sample autocorrelation function

For a time series {X1,X2,,Xn}\{X_1, X_2, \ldots, X_n\}:

ρ^(k)=t=1nk(XtXˉ)(Xt+kXˉ)t=1n(XtXˉ)2\hat{\rho}(k) = \frac{\sum_{t=1}^{n-k}(X_t - \bar{X})(X_{t+k} - \bar{X})}{\sum_{t=1}^{n}(X_t - \bar{X})^2}

This estimator is consistent: as nn \to \infty, ρ^(k)ρ(k)\hat{\rho}(k) \to \rho(k).

Sample autocovariance function

γ^(k)=1nt=1nk(XtXˉ)(Xt+kXˉ)\hat{\gamma}(k) = \frac{1}{n}\sum_{t=1}^{n-k}(X_t - \bar{X})(X_{t+k} - \bar{X})

Note the divisor is nn (not nkn-k). Dividing by nn introduces a small bias but guarantees that the resulting sample autocovariance matrix is positive semi-definite, which is important for downstream modeling.

Bias and variance of estimators

  • Both ρ^(k)\hat{\rho}(k) and γ^(k)\hat{\gamma}(k) are biased, but the bias shrinks as nn grows.
  • Their variance also decreases with nn, so larger samples give more reliable estimates.
  • For large lags relative to nn, the estimates become unreliable because fewer pairs of observations contribute to the sum. A common rule of thumb is to trust the sample ACF only up to about lag n/4n/4.

Bartlett's formula for variance

For a more refined assessment of sampling variability, Bartlett's formula gives the approximate variance of the sample ACF. Under a general stationary process:

Var(ρ^(k))1n(1+2i=1k1ρ(i)2)\text{Var}(\hat{\rho}(k)) \approx \frac{1}{n}\left(1 + 2\sum_{i=1}^{k-1}\rho(i)^2\right)

Under white noise, all ρ(i)=0\rho(i) = 0 for i1i \geq 1, so this reduces to 1/n1/n, which is where the standard confidence bands come from. For non-white-noise processes, Bartlett's formula accounts for the fact that existing autocorrelation inflates the variance of the sample ACF at higher lags.

Applications of autocorrelation and autocovariance

Time series analysis

Autocorrelation and autocovariance are the primary diagnostic tools for time series modeling. The shape of the ACF guides model selection:

  • A slowly decaying ACF suggests an AR component or non-stationarity.
  • An ACF that cuts off sharply after lag qq suggests an MA(qq) process.
  • The ACVF provides the parameter values needed for fitting AR and MA models (via the Yule-Walker equations, for instance).
Autocorrelation vs cross-correlation, Autocorrelation - wikidoc

Signal processing

Autocorrelation measures how similar a signal is to a delayed copy of itself. This is directly useful for:

  • Pitch detection: Periodic signals produce peaks in the autocorrelation at the period length
  • Noise reduction: Separating signal (correlated) from noise (uncorrelated)
  • Echo cancellation: Identifying delayed copies of a transmitted signal

Econometrics and finance

In financial markets, autocorrelation in asset returns is closely watched. The efficient market hypothesis predicts zero autocorrelation in returns (past prices shouldn't predict future prices). Significant autocorrelation in returns can indicate market inefficiency, momentum effects, or mean-reversion. Autocorrelation in squared returns is used to detect volatility clustering, which is a key feature modeled by GARCH-type processes.

Quality control and process monitoring

In manufacturing and industrial settings, autocorrelation helps detect when a process drifts from its target. Standard control charts (like Shewhart charts) assume independent observations; when autocorrelation is present, specialized charts like CUSUM (cumulative sum) and EWMA (exponentially weighted moving average) are used instead to avoid false alarms.

Models with autocorrelation

Autoregressive (AR) models

An AR(pp) model expresses the current value as a linear combination of the pp most recent values plus a white noise term:

Xt=ϕ1Xt1+ϕ2Xt2++ϕpXtp+εtX_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} + \varepsilon_t

The ACF of an AR process decays gradually (exponentially or with damped oscillations), while the partial autocorrelation function (PACF) cuts off after lag pp. This is how you identify the order of an AR model in practice.

Moving average (MA) models

An MA(qq) model expresses the current value as a linear combination of the current and qq most recent white noise terms:

Xt=εt+θ1εt1+θ2εt2++θqεtqX_t = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}

The ACF of an MA(qq) process cuts off sharply after lag qq (i.e., ρ(k)=0\rho(k) = 0 for k>qk > q), while the PACF decays gradually. This is the mirror image of the AR case and is the key diagnostic for identifying MA order.

Autoregressive moving average (ARMA) models

ARMA(p,qp, q) models combine both components:

Xt=ϕ1Xt1++ϕpXtp+εt+θ1εt1++θqεtqX_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}

Both the ACF and PACF of an ARMA process decay gradually, which makes order selection harder. In practice, you often use information criteria (AIC, BIC) to choose pp and qq.

Autoregressive integrated moving average (ARIMA) models

ARIMA(p,d,qp, d, q) models handle non-stationary series by differencing dd times before fitting an ARMA(p,qp, q) model. For example, if a series has a linear trend, first-order differencing (d=1d = 1) typically removes it, producing a stationary series that can be modeled with ARMA. ARIMA models are among the most widely used tools for time series forecasting.

Testing for autocorrelation

Ljung-Box test

The Ljung-Box test checks whether the first mm autocorrelations are jointly zero. The test statistic is:

Q=n(n+2)k=1mρ^(k)2nkQ = n(n+2)\sum_{k=1}^{m}\frac{\hat{\rho}(k)^2}{n-k}

Under the null hypothesis of no autocorrelation, QQ follows a χ2\chi^2 distribution with mm degrees of freedom (adjusted if model parameters were estimated). A large QQ (small p-value) means you reject the null and conclude that significant autocorrelation exists. The choice of mm matters: too small and you miss higher-order dependence, too large and you lose power.

Durbin-Watson test

The Durbin-Watson test specifically targets first-order autocorrelation in regression residuals:

d=t=2n(etet1)2t=1net2d = \frac{\sum_{t=2}^{n}(e_t - e_{t-1})^2}{\sum_{t=1}^{n}e_t^2}

The statistic dd ranges from 0 to 4:

  • d2d \approx 2: No first-order autocorrelation
  • d2d \ll 2: Positive autocorrelation (residuals tend to stay on the same side of zero)
  • d2d \gg 2: Negative autocorrelation (residuals tend to alternate signs)

A limitation is that it only detects lag-1 autocorrelation and can give misleading results when lagged dependent variables appear as regressors.

Breusch-Godfrey test

The Breusch-Godfrey test is more general than Durbin-Watson. It can detect autocorrelation of any specified order in regression residuals. The procedure is:

  1. Fit the original regression and obtain residuals ete_t
  2. Regress ete_t on the original regressors and lagged residuals et1,et2,,etpe_{t-1}, e_{t-2}, \ldots, e_{t-p}
  3. Compute nR2nR^2 from this auxiliary regression, which follows a χ2(p)\chi^2(p) distribution under the null of no autocorrelation

This test works even when lagged dependent variables are included as regressors, making it more robust than the Durbin-Watson test.

Portmanteau tests

Portmanteau tests are a general class that includes both the Box-Pierce and Ljung-Box tests. They aggregate information across multiple lags into a single test statistic based on the sum of squared sample autocorrelations. These tests are good for detecting any autocorrelation up to lag mm, but they don't tell you which specific lags are significant. For that, you'd look at the individual ACF bars and their confidence bands.