The Autocorrelation Function (ACF) measures how correlated a time series is with its own past values at different time lags. It's one of the first diagnostic tools you'll reach for when analyzing time series data, because the shape of the ACF plot tells you a lot about what kind of model might fit your data well.

Definition and Purpose of ACF

Autocorrelation is the correlation between a time series and a lagged copy of itself. The ACF computes this correlation at every lag (1, 2, 3, ...) and gives you a complete picture of how the series relates to its own history.

The ACF at lag $k$ is defined as:

$r_k = \frac{\sum_{t=k+1}^{T}(y_t - \bar{y})(y_{t-k} - \bar{y})}{\sum_{t=1}^{T}(y_t - \bar{y})^2}$

where $y_t$ is the observation at time $t$ , $\bar{y}$ is the series mean, and $T$ is the total number of observations. The result is always between $-1$ and $+1$ , just like a regular correlation coefficient.

The ACF serves several purposes:

Pattern detection: A slowly decaying ACF suggests a trend. Spikes at regular intervals point to seasonality.
Model identification: The shape of the ACF helps you decide what type of time series model (AR, MA, ARMA) is appropriate.
Residual diagnostics: After fitting a model, you check the ACF of the residuals. If significant autocorrelation remains, the model hasn't captured all the structure in the data.

Definition and purpose of ACF, What to read from the autocorrelation function of a time series? - Cross Validated

Interpretation of ACF Plots

An ACF plot (also called a correlogram) shows the correlation coefficient on the y-axis and the lag number on the x-axis. Two horizontal dashed lines mark the 95% confidence bounds, typically at $\pm \frac{1.96}{\sqrt{T}}$ . Any spike that extends beyond these bounds is statistically significant.

Here's how to read common ACF patterns:

Positive autocorrelation shows up as ACF values that stay positive and decay slowly toward zero. This means high values tend to follow high values, and low values tend to follow low values. You'll see this in data with trends or strong momentum.

Negative autocorrelation produces ACF values that alternate in sign, flipping between positive and negative at successive lags. This indicates an oscillating pattern where a high value tends to be followed by a low value and vice versa.

No autocorrelation (white noise) looks like ACF values hovering near zero at all lags, with at most one or two spikes barely crossing the confidence bounds by chance. This is what you want to see in model residuals.

Trend causes the ACF to decay very slowly, staying significant for many lags. The series has long memory, meaning distant past values still correlate with the present.

Seasonality produces noticeable spikes at regular intervals. For monthly data with an annual cycle, you'd see spikes at lags 12, 24, 36, etc. The ACF at lag 12 captures the correlation between, say, this January and last January.

Definition and purpose of ACF, Autocorrelation functions of materially different time series

Significance of Lag in ACF

The lag is simply the number of time steps separating two observations being compared.

Lag 1: correlation between each observation and the one immediately before it
Lag 2: correlation between each observation and the one two steps back
Lag k: correlation between observations separated by $k$ time steps

Why lag matters for understanding your data:

Slowly decaying ACF across many lags means the series has long-term dependence. Current values are influenced by the distant past, and the data likely needs differencing before modeling.
Quickly decaying ACF (drops to insignificance within a few lags) means short-term dependence. Only recent past values matter for predicting the present.
The lag at which spikes appear tells you the period of any seasonal pattern. Significant ACF at lag 7 in daily data, for instance, suggests a weekly cycle.

Application of ACF for Model Selection

The ACF is especially useful for identifying Moving Average (MA) models. An MA model of order $q$ uses past forecast errors to predict the current value:

$y_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}$

A pure MA( $q$ ) process has a distinctive ACF signature: significant spikes at lags 1 through $q$ , then a sharp cutoff to zero. So if the ACF drops to insignificance after lag 2, that suggests an MA(2) model.

For Autoregressive (AR) models, the ACF alone isn't enough. An AR( $p$ ) model uses past values of the series itself:

$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \varepsilon_t$

The ACF of an AR process decays gradually (exponentially or in a damped oscillation) rather than cutting off sharply. To pin down the order $p$ , you need the Partial Autocorrelation Function (PACF), which removes the influence of intermediate lags. The PACF of an AR( $p$ ) process cuts off sharply after lag $p$ .

Here's a quick reference for using ACF and PACF together:

Pattern	ACF	PACF	Suggested Model
Gradual decay	Tails off slowly	Cuts off after lag $p$	AR( $p$ )
Sharp cutoff	Cuts off after lag $q$	Tails off slowly	MA( $q$ )
Both tail off	Tails off	Tails off	ARMA( $p, q$ )

Steps for using the ACF in model selection:

Plot the ACF and check for trend or seasonality. If the ACF decays very slowly, difference the series first and recompute.
Look at the shape of the ACF. A sharp cutoff suggests an MA model; gradual decay suggests an AR or ARMA model.
Plot the PACF alongside the ACF. Use both together to narrow down the model type and order.
Fit the candidate model, then check the ACF of the residuals. If no significant spikes remain, the model is adequate.

2,589 studying →