SARIMA models handle time series data that have both trend and seasonal patterns. They work by combining regular ARIMA components with seasonal counterparts, captured in the notation $(p, d, q) \times (P, D, Q)_m$ , where $m$ is the seasonal period (e.g., 12 for monthly data with yearly seasonality). This section covers how to estimate those parameters, check whether your model fits well, and generate forecasts.

Parameter Estimation for SARIMA Models

Maximum likelihood estimation (MLE) is the standard method for fitting SARIMA models. MLE finds the parameter values that make your observed data most probable under the model. The process works like this:

Specify the model order. Choose $(p, d, q)$ for the non-seasonal part and $(P, D, Q)_m$ for the seasonal part, typically guided by ACF/PACF plots after differencing.
Initialize parameter values. The software sets starting guesses for each AR and MA coefficient (both regular and seasonal).
Optimize iteratively. An algorithm adjusts the parameters to maximize the likelihood function, repeating until the estimates stabilize (converge).
Check convergence. If the algorithm doesn't converge, you may need different starting values or a simpler model specification.

Other estimation methods exist, including conditional least squares (CLS) and unconditional least squares (ULS). In practice, most software defaults to MLE or a close variant, and for an intro course that's the one to know.

Diagnostic Tools for Model Fit

Once you've estimated a SARIMA model, you need to verify it actually captures the patterns in your data. Diagnostics focus on the residuals, which are the differences between observed values and what the model predicted.

What good residuals look like:

Uncorrelated with each other (no leftover pattern)
Approximately normally distributed
Mean of zero and roughly constant variance over time

Key diagnostic plots:

Residual ACF and PACF plots — If significant spikes remain, the model is missing some structure. This is often the most informative check.
Residuals vs. fitted values — Look for patterns or fanning (non-constant variance).
Q-Q plot — Compares residual distribution to a normal distribution. Points should fall near the diagonal line.

The Ljung-Box test provides a formal check for residual autocorrelation. Its null hypothesis is that the residuals are independently distributed (no autocorrelation). If you reject the null (low p-value), your model likely hasn't captured all the time series dynamics, and you should revisit your model specification.

Information criteria help you compare competing models:

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) both penalize model complexity while rewarding goodness-of-fit. Lower values are better.
BIC penalizes extra parameters more heavily than AIC, so it tends to favor simpler models. When AIC and BIC disagree, BIC's pick is more parsimonious.

Parameter estimation for SARIMA models, Maximum Likelihood Estimate and Logistic Regression simplified — Pavan Mirla

Forecasting and Model Selection

Forecasting with SARIMA Models

Forecasting uses your estimated model to predict future values. SARIMA models produce two types of forecasts:

Point forecasts give a single predicted value for each future time point.
Prediction intervals give a range of plausible values around each point forecast. A 95% prediction interval means you expect the true value to fall within that range 95% of the time. These intervals widen as the forecast horizon increases, reflecting growing uncertainty further into the future.

The forecast horizon is how many periods ahead you're predicting. Short-horizon forecasts (1–3 steps ahead) are generally more accurate than long-horizon ones.

When interpreting SARIMA forecasts, keep in mind:

The forecasts will reflect the trend and seasonal patterns the model learned from historical data.
Prediction intervals communicate uncertainty. Narrow intervals suggest more confidence; wide intervals suggest less.
SARIMA assumes past patterns continue into the future. Structural breaks (e.g., a pandemic, a policy change) can invalidate forecasts quickly.

Model Selection and Comparison

Choosing the best SARIMA model means balancing fit and simplicity. You want a model that explains the data well without overfitting.

In-sample measures (how well the model fits the data it was trained on):

AIC and BIC (lower is better)
Residual diagnostics (clean residuals as described above)

Out-of-sample measures (how well the model predicts data it hasn't seen):

RMSE (Root Mean Squared Error) — penalizes large errors more heavily
MAE (Mean Absolute Error) — treats all errors equally by size
MAPE (Mean Absolute Percentage Error) — expresses error as a percentage, useful for comparing across different scales

Cross-validation techniques like rolling origin evaluation strengthen out-of-sample assessment. In rolling origin, you repeatedly fit the model on expanding windows of data and forecast the next observation, simulating real forecasting conditions.

A good model should perform well on both in-sample and out-of-sample metrics. If a model fits the training data perfectly but forecasts poorly, it's likely overfit.

Real-World Applications of SARIMA

SARIMA models are a natural fit for data with clear seasonal cycles. Common examples include:

Retail sales — Monthly or quarterly sales data often show holiday spikes and seasonal buying patterns.
Energy demand — Electricity consumption follows daily and seasonal cycles tied to weather and human activity.
Economic indicators — Quarterly GDP growth or monthly inflation rates often exhibit seasonal regularities.

Steps for applying a SARIMA model in practice:

Plot the series and identify its characteristics (trend, seasonality, any obvious outliers).
Apply differencing (regular $d$ and seasonal $D$ ) to achieve stationarity. Verify with ACF plots or unit root tests.
Use ACF and PACF plots of the differenced series to guide your choice of $p, q, P, Q$ .
Estimate model parameters using MLE.
Run diagnostics: check residual plots, Ljung-Box test, and information criteria.
If diagnostics reveal problems, revise the model and re-estimate.
Generate forecasts and report prediction intervals alongside point forecasts.
Monitor performance over time and re-estimate as new data arrives.

2,589 studying →