Introduction to Mixed ARMA Models
ARMA models combine autoregressive (AR) and moving average (MA) components into a single framework for modeling stationary time series. Where a pure AR model only uses past values and a pure MA model only uses past errors, an ARMA model uses both, making it more flexible and often more parsimonious (fewer parameters needed to capture the same patterns). Getting comfortable with ARMA is essential before moving on to ARIMA, which adds differencing for non-stationary data.
Characteristics of ARMA Models
The AR component models how the current value depends on its own past values. The MA component models how the current value depends on past forecast errors (white noise terms). Together, they can capture both the momentum of a series and the short-lived effects of random shocks.
A core assumption is stationarity: the mean, variance, and autocovariance of the series don't change over time. If your data has a trend or changing variance, you need to transform it before fitting an ARMA model.
The general ARMA(p, q) model is written as:
Where:
- is the observed value at time
- is a constant (related to the mean of the series)
- are the autoregressive coefficients
- are the moving average coefficients
- are white noise error terms (mean zero, constant variance, uncorrelated)
Orders in ARMA Models
The notation ARMA(p, q) tells you exactly how many terms of each type the model includes:
- p = number of lagged observations (AR order)
- q = number of lagged error terms (MA order)
For example, an ARMA(1, 1) model looks like:
This is the simplest mixed model: one past value and one past error. An ARMA(2, 1) would add a second lagged observation () while keeping just one lagged error.
Notice that ARMA(p, 0) is just a pure AR(p) model, and ARMA(0, q) is a pure MA(q) model. So ARMA is really a generalization of both.
Choosing p and q relies on examining two diagnostic plots:
- ACF (autocorrelation function): measures correlation between observations at different lags. For a pure AR process, the ACF decays gradually; for a pure MA process, it cuts off sharply after lag q.
- PACF (partial autocorrelation function): measures correlation at each lag after removing the effects of shorter lags. For a pure AR process, the PACF cuts off after lag p; for a pure MA process, it decays gradually.
When both the ACF and PACF show gradual decay rather than a clean cutoff, that's a strong signal you need a mixed ARMA model rather than a pure AR or MA.

Estimation and Assessment of ARMA Models
Parameter Estimation
Once you've chosen p and q, you need to estimate the coefficients. Two main approaches:
- Least squares estimation finds the values of , , and that minimize the sum of squared residuals (the differences between observed and predicted values). This is conceptually straightforward but can be computationally tricky for the MA part, since past errors aren't directly observed.
- Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable under the assumed model. MLE is generally preferred because it has better statistical properties (efficiency, consistency) and handles the MA component more naturally.
In practice, most software uses MLE or a conditional least squares approach by default.

Stationarity and Invertibility
These two conditions determine whether your fitted model is valid and well-behaved.
Stationarity (applies to the AR part): The model is stationary if the roots of the AR characteristic polynomial lie outside the unit circle.
If all roots of this equation satisfy , the model is stationary. For the simple AR(1) case, this reduces to , which is easier to check by hand.
Invertibility (applies to the MA part): The model is invertible if the roots of the MA characteristic polynomial lie outside the unit circle.
Invertibility guarantees a unique MA representation and ensures that recent observations carry more weight than distant ones. For an MA(1), this simplifies to .
A model that fails either condition can produce unreliable forecasts and misleading coefficient estimates. If you encounter this, reconsider your choice of p and q or check whether the data truly is stationary.
Applying ARMA Models in Practice
Fitting an ARMA model follows a structured workflow:
-
Select orders (p, q) by examining the ACF and PACF plots of your stationary series
-
Estimate parameters using MLE or least squares
-
Check model fit with diagnostic tools:
- Residual analysis: residuals should look like white noise (no remaining autocorrelation, roughly normal distribution)
- Information criteria: AIC and BIC penalize model complexity, so lower values indicate a better balance of fit and simplicity. If ARMA(1,1) and ARMA(2,1) fit similarly, the information criteria help you pick the simpler one
-
Interpret the coefficients:
- AR coefficients () reflect how strongly past values persist into the present. A of 0.8 means the series has strong momentum.
- MA coefficients () reflect how past shocks ripple forward. A of -0.5 means a positive shock at time pulls the current value down by half that shock's size.
-
Generate forecasts from the fitted model and evaluate accuracy against held-out data
-
Acknowledge limitations: ARMA models assume linearity and stationarity. Real-world data often violates these assumptions, which is why ARIMA (adding differencing) and other extensions exist