ARMA models combine autoregressive (AR) and moving average (MA) components into a single framework for modeling stationary time series. Where a pure AR model only uses past values and a pure MA model only uses past errors, an ARMA model uses both, making it more flexible and often more parsimonious (fewer parameters needed to capture the same patterns). Getting comfortable with ARMA is essential before moving on to ARIMA, which adds differencing for non-stationary data.

Characteristics of ARMA Models

The AR component models how the current value depends on its own past values. The MA component models how the current value depends on past forecast errors (white noise terms). Together, they can capture both the momentum of a series and the short-lived effects of random shocks.

A core assumption is stationarity: the mean, variance, and autocovariance of the series don't change over time. If your data has a trend or changing variance, you need to transform it before fitting an ARMA model.

The general ARMA(p, q) model is written as:

$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}$

Where:

$y_t$ is the observed value at time $t$
$c$ is a constant (related to the mean of the series)
$\phi_1, \phi_2, \ldots, \phi_p$ are the autoregressive coefficients
$\theta_1, \theta_2, \ldots, \theta_q$ are the moving average coefficients
$\varepsilon_t, \varepsilon_{t-1}, \ldots, \varepsilon_{t-q}$ are white noise error terms (mean zero, constant variance, uncorrelated)

Orders in ARMA Models

The notation ARMA(p, q) tells you exactly how many terms of each type the model includes:

p = number of lagged observations (AR order)
q = number of lagged error terms (MA order)

For example, an ARMA(1, 1) model looks like:

$y_t = c + \phi_1 y_{t-1} + \varepsilon_t + \theta_1 \varepsilon_{t-1}$

This is the simplest mixed model: one past value and one past error. An ARMA(2, 1) would add a second lagged observation ( $\phi_2 y_{t-2}$ ) while keeping just one lagged error.

Notice that ARMA(p, 0) is just a pure AR(p) model, and ARMA(0, q) is a pure MA(q) model. So ARMA is really a generalization of both.

Choosing p and q relies on examining two diagnostic plots:

ACF (autocorrelation function): measures correlation between observations at different lags. For a pure AR process, the ACF decays gradually; for a pure MA process, it cuts off sharply after lag q.
PACF (partial autocorrelation function): measures correlation at each lag after removing the effects of shorter lags. For a pure AR process, the PACF cuts off after lag p; for a pure MA process, it decays gradually.

When both the ACF and PACF show gradual decay rather than a clean cutoff, that's a strong signal you need a mixed ARMA model rather than a pure AR or MA.

Characteristics of ARMA models, Autoregressive model - Wikipedia

Estimation and Assessment of ARMA Models

Parameter Estimation

Once you've chosen p and q, you need to estimate the coefficients. Two main approaches:

Least squares estimation finds the values of $\phi_i$ , $\theta_j$ , and $c$ that minimize the sum of squared residuals (the differences between observed and predicted values). This is conceptually straightforward but can be computationally tricky for the MA part, since past errors aren't directly observed.
Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable under the assumed model. MLE is generally preferred because it has better statistical properties (efficiency, consistency) and handles the MA component more naturally.

In practice, most software uses MLE or a conditional least squares approach by default.

Characteristics of ARMA models, Application of the Improved Generalized Autoregressive Conditional Heteroskedast Model Based on ...

Stationarity and Invertibility

These two conditions determine whether your fitted model is valid and well-behaved.

Stationarity (applies to the AR part): The model is stationary if the roots of the AR characteristic polynomial lie outside the unit circle.

$1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0$

If all roots of this equation satisfy $|z| > 1$ , the model is stationary. For the simple AR(1) case, this reduces to $|\phi_1| < 1$ , which is easier to check by hand.

Invertibility (applies to the MA part): The model is invertible if the roots of the MA characteristic polynomial lie outside the unit circle.

$1 + \theta_1 z + \theta_2 z^2 + \cdots + \theta_q z^q = 0$

Invertibility guarantees a unique MA representation and ensures that recent observations carry more weight than distant ones. For an MA(1), this simplifies to $|\theta_1| < 1$ .

A model that fails either condition can produce unreliable forecasts and misleading coefficient estimates. If you encounter this, reconsider your choice of p and q or check whether the data truly is stationary.

Applying ARMA Models in Practice

Fitting an ARMA model follows a structured workflow:

Select orders (p, q) by examining the ACF and PACF plots of your stationary series
Estimate parameters using MLE or least squares
Check model fit with diagnostic tools:
- Residual analysis: residuals should look like white noise (no remaining autocorrelation, roughly normal distribution)
- Information criteria: AIC and BIC penalize model complexity, so lower values indicate a better balance of fit and simplicity. If ARMA(1,1) and ARMA(2,1) fit similarly, the information criteria help you pick the simpler one
Interpret the coefficients:
- AR coefficients ( $\phi$ ) reflect how strongly past values persist into the present. A $\phi_1$ of 0.8 means the series has strong momentum.
- MA coefficients ( $\theta$ ) reflect how past shocks ripple forward. A $\theta_1$ of -0.5 means a positive shock at time $t-1$ pulls the current value down by half that shock's size.
Generate forecasts from the fitted model and evaluate accuracy against held-out data
Acknowledge limitations: ARMA models assume linearity and stationarity. Real-world data often violates these assumptions, which is why ARIMA (adding differencing) and other extensions exist

2,589 studying →