Why This Matters
Time series analysis sits at the heart of predictive modeling in data science—you're being tested on your ability to choose the right model for the right data structure and interpret results numerically. Whether you're forecasting stock prices, analyzing sensor data, or modeling climate patterns, these methods transform raw sequential observations into actionable predictions. The concepts here connect directly to stationarity, autocorrelation, spectral decomposition, and numerical optimization—all core themes in numerical analysis.
Don't just memorize model names and formulas. Know why each model exists: What data problem does it solve? What assumptions does it require? When you understand that AR models capture memory in the signal while MA models capture memory in the noise, you'll be able to justify model selection on exams and in practice. Focus on the numerical machinery underneath—parameter estimation, differencing operations, and frequency transforms are where the real computational challenges live.
Models Based on Past Values (Autoregressive Structure)
These models assume that future values depend linearly on past observations. The core idea: the system has memory, and that memory decays at a rate captured by model coefficients.
Autoregressive (AR) Models
- Order parameter p determines how many lagged observations influence the current value—higher p captures longer-term dependencies
- Linear combination structure: Xt=c+∑i=1pϕiXt−i+εt, where ϕi coefficients are estimated via least squares or maximum likelihood
- Stationarity requirement—the characteristic polynomial's roots must lie outside the unit circle for stable forecasting
Vector Autoregression (VAR) Models
- Multivariate extension of AR—each variable is regressed on its own lags and the lags of all other variables in the system
- Captures interdependencies between time series, making it essential for Granger causality testing and impulse response analysis
- Parameter explosion problem—with k variables and p lags, you estimate k2p coefficients, requiring careful regularization or dimensionality constraints
Compare: AR vs. VAR—both use past values to predict future values, but AR models a single series while VAR captures cross-variable dynamics. If asked about forecasting multiple related economic indicators simultaneously, VAR is your answer.
Models Based on Past Errors (Moving Average Structure)
These models use past forecast errors rather than past observations. The intuition: shocks to the system propagate forward, and we model how quickly they dissipate.
Moving Average (MA) Models
- Order parameter q specifies how many past error terms affect the current value—captures short-term shock effects
- Model form: Xt=μ+εt+∑j=1qθjεt−j, where θj weights determine shock persistence
- Always stationary regardless of parameter values—a key advantage when stationarity is uncertain
Exponential Smoothing Models
- Weighted averaging with exponentially decreasing weights—recent observations matter more, controlled by smoothing parameter α∈(0,1)
- Variants scale with complexity: simple (level only), Holt's method (level + trend), Holt-Winters (level + trend + seasonality)
- State space equivalence—exponential smoothing models can be written as special cases of state space models, unifying their theoretical treatment
Compare: MA vs. Exponential Smoothing—both weight past information, but MA uses discrete error terms while exponential smoothing applies continuous decay to observations. Exponential smoothing is often preferred for quick, interpretable forecasts; MA integrates into ARMA frameworks for rigorous modeling.
Combined and Extended Frameworks (ARMA Family)
These models merge autoregressive and moving average components, then add machinery for handling non-stationarity and seasonality.
Autoregressive Moving Average (ARMA) Models
- Combines AR(p) and MA(q) into a unified framework: Xt=c+∑i=1pϕiXt−i+εt+∑j=1qθjεt−j
- Requires stationarity—use ACF and PACF plots to identify appropriate p and q values before fitting
- Parsimony principle—ARMA often achieves the same fit as pure AR or MA with fewer total parameters
Autoregressive Integrated Moving Average (ARIMA) Models
- Differencing operator d transforms non-stationary data: apply ∇dXt=(1−B)dXt where B is the backshift operator
- Notation ARIMA(p,d,q)—d=1 removes linear trends, d=2 removes quadratic trends; rarely need d>2
- Box-Jenkins methodology—systematic approach of identification, estimation, diagnostic checking remains the gold standard for model selection
Seasonal ARIMA (SARIMA) Models
- Full notation SARIMA(p,d,q)(P,D,Q)s—lowercase for non-seasonal, uppercase for seasonal components, s is the seasonal period
- Multiplicative structure—seasonal and non-seasonal polynomials multiply, capturing both short-term dynamics and periodic patterns
- Common configurations: (1,1,1)(1,1,1)12 for monthly data with annual seasonality is a frequent starting point
Compare: ARMA vs. ARIMA vs. SARIMA—ARMA assumes stationarity, ARIMA adds differencing for trends, SARIMA adds seasonal differencing and seasonal AR/MA terms. When choosing models, first check: Is the data stationary? Is there seasonality? Your answers determine which framework applies.
Volatility and Heteroskedasticity Models
When variance itself changes over time, standard models fail. These methods model the conditional variance as a dynamic process.
GARCH Models for Volatility
- Conditional variance equation: σt2=ω+∑i=1qαiεt−i2+∑j=1pβjσt−j2 captures volatility clustering
- GARCH(1,1) dominance—in practice, this simple specification fits most financial data remarkably well
- Risk management applications—Value-at-Risk (VaR) calculations and option pricing rely heavily on accurate volatility forecasts
Compare: ARIMA vs. GARCH—ARIMA models the conditional mean (predicting the value), while GARCH models the conditional variance (predicting uncertainty). For financial returns, you often need both: ARIMA for the mean equation, GARCH for the variance equation.
State Space and Frequency Domain Methods
These approaches offer alternative representations that unlock powerful estimation algorithms and reveal hidden structure.
State Space Models and Kalman Filtering
- Hidden state framework—observation equation links observed data to latent states; state equation describes state evolution over time
- Kalman filter algorithm—recursively computes optimal state estimates by combining predictions with new observations, minimizing mean squared error
- Handles missing data and measurement error naturally—a major advantage over direct ARIMA fitting for messy real-world data
- Frequency domain representation—Fourier transform converts x(t)→X(f), decomposing the signal into sinusoidal components
- Periodogram estimation—the squared magnitude ∣X(f)∣2 estimates power at each frequency, revealing dominant cycles
- Aliasing and Nyquist frequency—sampling rate must exceed twice the highest frequency present, or high-frequency components fold into lower frequencies
Compare: Kalman Filter vs. Spectral Analysis—Kalman filtering operates in the time domain and excels at real-time state estimation with uncertainty quantification. Spectral analysis operates in the frequency domain and excels at identifying periodic components. Use Kalman for tracking problems; use spectral methods for cycle detection.
Quick Reference Table
|
| Memory in past values | AR, VAR, ARIMA |
| Memory in past errors | MA, Exponential Smoothing |
| Non-stationarity handling | ARIMA (differencing), State Space |
| Seasonality modeling | SARIMA, Holt-Winters, Spectral Analysis |
| Time-varying volatility | GARCH, EGARCH, Stochastic Volatility |
| Multivariate relationships | VAR, Vector Error Correction (VEC) |
| Hidden/latent states | State Space, Kalman Filter |
| Frequency decomposition | Fourier Transform, Periodogram, Wavelets |
Self-Check Questions
-
Both AR and MA models capture temporal dependencies, but they model different sources of persistence. What distinguishes the "memory" in an AR(1) model from the "memory" in an MA(1) model?
-
You're given a time series with an obvious upward trend and annual seasonality. Which model framework would you choose, and what parameters would you need to specify? Justify your choice.
-
Compare ARIMA and GARCH: In what sense do they model different aspects of a time series? When would you use both together?
-
A colleague suggests using spectral analysis instead of fitting a SARIMA model to identify seasonal patterns. What are the tradeoffs between these two approaches?
-
FRQ-style prompt: Given a state space model with observation equation yt=Hxt+vt and state equation xt=Fxt−1+wt, describe the two steps of the Kalman filter algorithm and explain why it produces optimal state estimates under Gaussian assumptions.