upgrade
upgrade

🤝Collaborative Data Science

Key Techniques in Time Series Forecasting

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Time series forecasting sits at the heart of reproducible data science because it forces you to confront fundamental questions about model selection, validation, and uncertainty quantification. When you're working collaboratively on forecasting projects, your team needs to understand not just which model to use, but why that model fits the data-generating process. You're being tested on your ability to match forecasting techniques to data characteristics—stationarity, seasonality, multivariate relationships, and non-linear patterns.

The techniques in this guide span classical statistical approaches and modern machine learning methods, each with different assumptions, interpretability trade-offs, and reproducibility considerations. Don't just memorize model names—know what data structure each technique handles, what assumptions it makes, and when you'd choose one over another in a collaborative pipeline. That conceptual understanding is what separates competent practitioners from those who just run default parameters.


Classical Linear Models: The Foundation

These models assume that future values can be expressed as linear combinations of past values and/or past errors. The key insight is decomposing a time series into predictable components.

Autoregressive (AR) Models

  • Past values predict future values—the model assumes the current observation is a weighted sum of previous observations plus noise
  • Order parameter pp specifies how many lagged values are included; selecting pp typically involves examining partial autocorrelation functions (PACF)
  • Stationarity required—AR models assume constant mean and variance, so you'll need to transform non-stationary data first

Moving Average (MA) Models

  • Past forecast errors drive predictions—instead of using lagged values, MA models use lagged residuals from previous forecasts
  • Order parameter qq determines how many past error terms are included; autocorrelation function (ACF) helps identify appropriate qq
  • Smooths random fluctuations—particularly useful when shocks to the system have lingering but finite effects

Compare: AR vs. MA models—both are linear and require stationarity, but AR uses past values while MA uses past errors. In practice, most real-world series need both components, which is why ARIMA exists.

Autoregressive Integrated Moving Average (ARIMA) Models

  • Combines AR and MA with differencing—the "I" (integrated) component handles non-stationarity by differencing the series dd times
  • Three parameters (p,d,q)(p, d, q) define the model; differencing order dd is determined by unit root tests like ADF
  • Workhorse for non-seasonal trends—when your data has a trend but no repeating seasonal pattern, ARIMA is often your first choice

Seasonal ARIMA (SARIMA) Models

  • Extends ARIMA with seasonal terms—adds parameters (P,D,Q)m(P, D, Q)_m where mm is the seasonal period (e.g., 12 for monthly data with yearly cycles)
  • Captures repeating patterns—the seasonal differencing and seasonal AR/MA terms model predictable periodic behavior
  • Seven parameters to tune(p,d,q)×(P,D,Q)m(p, d, q) \times (P, D, Q)_m requires careful model selection; use information criteria like AIC/BIC

Compare: ARIMA vs. SARIMA—both handle trends through differencing, but SARIMA adds explicit seasonal structure. If your ACF shows spikes at regular lags (e.g., lag 12, 24, 36), you need SARIMA.


Exponential Smoothing Family: Weighted Averages

These methods apply exponentially decreasing weights to past observations, giving more influence to recent data. They're intuitive, computationally efficient, and often surprisingly competitive.

Exponential Smoothing Methods

  • Decreasing weights on older observations—recent data matters more; the smoothing parameter α\alpha controls how quickly weights decay
  • Three variants for different patternsSimple (level only), Double/Holt's (level + trend), Triple/Holt-Winters (level + trend + seasonality)
  • State space representation (ETS)—modern implementations frame these as state space models, enabling proper prediction intervals

Holt-Winters Method

  • Triple exponential smoothing—explicitly models three components: level (baseline), trend (direction), and seasonal (periodic pattern)
  • Additive vs. multiplicative seasonality—choose additive when seasonal swings are constant; multiplicative when they scale with the level
  • Three smoothing parameters (α\alpha, β\beta, γ\gamma)—each controls how quickly the respective component adapts to new data

Compare: Holt-Winters vs. SARIMA—both handle seasonality, but Holt-Winters is more intuitive and requires less diagnostic checking. SARIMA offers more flexibility for complex autocorrelation structures. For quick baselines in collaborative projects, Holt-Winters often wins on interpretability.


Flexible Frameworks: State Space and Modern Tools

These approaches provide unified frameworks that can represent many classical models while handling messy real-world data—missing values, outliers, and irregular patterns.

State Space Models

  • Separates observed data from hidden states—the observation equation links what you measure to underlying latent processes; the state equation describes how states evolve
  • Encompasses exponential smoothing and ARIMA—both model families are special cases; this framework unifies them under one roof
  • Kalman filter for estimation—provides optimal state estimates and proper uncertainty quantification; handles missing data naturally

Prophet (Facebook's Forecasting Tool)

  • Designed for business time series—handles missing values, outliers, and holiday effects that break traditional models
  • Additive regression model—decomposes series into trend + seasonality + holidays + error; users specify components via intuitive parameters
  • Reproducibility-friendly—open source, well-documented, and produces interpretable outputs that facilitate team collaboration

Compare: State Space Models vs. Prophet—state space models offer theoretical elegance and optimal filtering, while Prophet prioritizes practical usability. For collaborative projects with non-statisticians, Prophet's interpretable decomposition often makes communication easier.


Multivariate and Deep Learning: Complex Dependencies

When you have multiple interacting time series or highly non-linear patterns, these techniques capture structure that univariate linear models miss.

Vector Autoregression (VAR) Models

  • Multiple series modeled jointly—each variable is regressed on its own lags and the lags of all other variables in the system
  • Captures interdependencies—essential when variables influence each other (e.g., interest rates and inflation); Granger causality tests emerge naturally
  • Impulse response functions—trace how a shock to one variable propagates through the system over time; key for policy analysis

Long Short-Term Memory (LSTM) Networks

  • Recurrent neural network architecture—designed to learn long-term dependencies that standard RNNs forget due to vanishing gradients
  • Memory cells with gatesforget, input, and output gates control information flow, allowing the network to retain relevant patterns over long sequences
  • Handles non-linear relationships—when linear models plateau, LSTMs can capture complex patterns, but require more data and careful hyperparameter tuning

Compare: VAR vs. LSTM—VAR is interpretable and statistically grounded but assumes linear relationships; LSTM can model non-linearities but acts as a black box. For reproducible science, VAR's transparency often matters more than LSTM's flexibility unless you have substantial data and validation infrastructure.


Quick Reference Table

ConceptBest Examples
Linear univariate modelsAR, MA, ARIMA
Seasonal patternsSARIMA, Holt-Winters
Exponentially weighted smoothingSimple/Double/Triple Exponential Smoothing, Holt-Winters
Non-stationarity handlingARIMA (differencing), State Space Models
Multivariate dependenciesVAR
Flexible/unified frameworksState Space Models, Prophet
Non-linear patternsLSTM
Missing data/outliersProphet, State Space Models

Self-Check Questions

  1. Which two model families can both be represented as special cases of state space models, and what does this unification provide for reproducible analysis?

  2. You're examining a monthly sales dataset and notice strong spikes in the ACF at lags 12, 24, and 36. Which technique should you consider, and what parameters would capture this pattern?

  3. Compare and contrast VAR and LSTM for modeling multiple interacting time series—what are the key trade-offs in terms of interpretability and assumptions?

  4. A collaborator proposes using ARIMA on a dataset with significant missing values and known holiday effects. What alternative would you suggest, and why might it improve reproducibility?

  5. When would you choose Holt-Winters over SARIMA for a seasonal forecasting task, and what practical considerations in a collaborative pipeline might influence this decision?