Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Time series forecasting sits at the heart of reproducible data science because it forces you to confront fundamental questions about model selection, validation, and uncertainty quantification. When you're working collaboratively on forecasting projects, your team needs to understand not just which model to use, but why that model fits the data-generating process. You're being tested on your ability to match forecasting techniques to data characteristics—stationarity, seasonality, multivariate relationships, and non-linear patterns.
The techniques in this guide span classical statistical approaches and modern machine learning methods, each with different assumptions, interpretability trade-offs, and reproducibility considerations. Don't just memorize model names—know what data structure each technique handles, what assumptions it makes, and when you'd choose one over another in a collaborative pipeline. That conceptual understanding is what separates competent practitioners from those who just run default parameters.
These models assume that future values can be expressed as linear combinations of past values and/or past errors. The key insight is decomposing a time series into predictable components.
Compare: AR vs. MA models—both are linear and require stationarity, but AR uses past values while MA uses past errors. In practice, most real-world series need both components, which is why ARIMA exists.
Compare: ARIMA vs. SARIMA—both handle trends through differencing, but SARIMA adds explicit seasonal structure. If your ACF shows spikes at regular lags (e.g., lag 12, 24, 36), you need SARIMA.
These methods apply exponentially decreasing weights to past observations, giving more influence to recent data. They're intuitive, computationally efficient, and often surprisingly competitive.
Compare: Holt-Winters vs. SARIMA—both handle seasonality, but Holt-Winters is more intuitive and requires less diagnostic checking. SARIMA offers more flexibility for complex autocorrelation structures. For quick baselines in collaborative projects, Holt-Winters often wins on interpretability.
These approaches provide unified frameworks that can represent many classical models while handling messy real-world data—missing values, outliers, and irregular patterns.
Compare: State Space Models vs. Prophet—state space models offer theoretical elegance and optimal filtering, while Prophet prioritizes practical usability. For collaborative projects with non-statisticians, Prophet's interpretable decomposition often makes communication easier.
When you have multiple interacting time series or highly non-linear patterns, these techniques capture structure that univariate linear models miss.
Compare: VAR vs. LSTM—VAR is interpretable and statistically grounded but assumes linear relationships; LSTM can model non-linearities but acts as a black box. For reproducible science, VAR's transparency often matters more than LSTM's flexibility unless you have substantial data and validation infrastructure.
| Concept | Best Examples |
|---|---|
| Linear univariate models | AR, MA, ARIMA |
| Seasonal patterns | SARIMA, Holt-Winters |
| Exponentially weighted smoothing | Simple/Double/Triple Exponential Smoothing, Holt-Winters |
| Non-stationarity handling | ARIMA (differencing), State Space Models |
| Multivariate dependencies | VAR |
| Flexible/unified frameworks | State Space Models, Prophet |
| Non-linear patterns | LSTM |
| Missing data/outliers | Prophet, State Space Models |
Which two model families can both be represented as special cases of state space models, and what does this unification provide for reproducible analysis?
You're examining a monthly sales dataset and notice strong spikes in the ACF at lags 12, 24, and 36. Which technique should you consider, and what parameters would capture this pattern?
Compare and contrast VAR and LSTM for modeling multiple interacting time series—what are the key trade-offs in terms of interpretability and assumptions?
A collaborator proposes using ARIMA on a dataset with significant missing values and known holiday effects. What alternative would you suggest, and why might it improve reproducibility?
When would you choose Holt-Winters over SARIMA for a seasonal forecasting task, and what practical considerations in a collaborative pipeline might influence this decision?