Time series model selection is about finding the sweet spot between fit and complexity. A model that's too simple misses real patterns in your data (underfitting), while one that's too complex starts fitting random noise (overfitting). The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are the two main tools for navigating this tradeoff.

Importance of Model Selection

Time series data often contain layered patterns: autocorrelation, seasonality, and trend all at once. Model selection is the process of choosing which candidate model best captures those patterns without becoming unnecessarily complex.

Overfitting happens when a model has too many parameters and starts fitting noise rather than signal. It looks great on your training data but forecasts poorly.
Underfitting happens when a model is too simple to capture the real structure in the data.
The goal is a model that generalizes well, meaning it performs accurately on new, unseen data, not just the data it was trained on.

Importance of model selection, MR - Bootstrap aggregation for model selection in the model-free formalism

Role of the Akaike Information Criterion (AIC)

AIC was developed by Hirotugu Akaike in 1974. It evaluates a model by weighing how well it fits the data against how many parameters it uses.

The formula is:

$AIC = 2k - 2\ln(L)$

where $k$ is the number of estimated parameters and $L$ is the maximized likelihood of the model (a measure of how probable the observed data are under that model).

A few things to note about AIC:

Lower AIC = better model. The $-2\ln(L)$ term rewards good fit, while the $2k$ term adds a penalty for each additional parameter.
The penalty per parameter is fixed at 2, regardless of sample size. This means AIC can sometimes favor slightly more complex models, especially with large datasets.
AIC allows you to compare non-nested models, meaning models that aren't simply restricted versions of each other (e.g., comparing an ARIMA model to an exponential smoothing model).

Importance of model selection, Time series forecast cross-validation

Bayesian Information Criterion (BIC) vs. AIC

BIC was developed by Gideon Schwarz in 1978. It takes a similar approach but incorporates the sample size into its complexity penalty.

The formula is:

$BIC = k\ln(n) - 2\ln(L)$

where $k$ is the number of parameters, $n$ is the sample size, and $L$ is the maximized likelihood.

The key difference is in the penalty term. AIC penalizes each parameter by 2, while BIC penalizes each parameter by $\ln(n)$ . Since $\ln(n) > 2$ whenever $n > 7$ (which is almost always in time series work), BIC penalizes complexity more heavily than AIC for any reasonably sized dataset.

AIC tends to select slightly more complex models. It's often preferred when the goal is short-term forecast accuracy.

BIC tends to select simpler, more parsimonious models. It has a theoretical property called consistency: as sample size grows, BIC will select the true model (if it's among the candidates) with probability approaching 1. AIC does not have this property.

In practice, AIC and BIC often agree. When they disagree, it's usually because AIC picks a model with one or two extra parameters. Neither is universally "better"; the right choice depends on your analysis goals.

Applying AIC and BIC

Here's the typical workflow:

Fit candidate models to your time series data (e.g., several ARIMA orders, SARIMA variants, or exponential smoothing models).
Calculate the log-likelihood and count the parameters for each fitted model. Most software does this automatically.
Compute AIC and BIC for each model using the formulas above.
Select the model with the lowest AIC or BIC value as your top candidate.

A couple of important caveats:

AIC and BIC give you a relative ranking of your candidate models. A model with the lowest AIC in your set could still be a poor model overall if none of your candidates are appropriate. Always check residual diagnostics (covered later in this unit) to confirm the selected model is adequate.
Think about the purpose of your analysis. If you need the best short-term forecasts, AIC's slight preference for complexity may help. If you want to identify the simplest model that explains the data's structure, BIC is often the better guide.

2,589 studying →