🔮Forecasting

Key Forecasting Accuracy Metrics

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Why This Matters

When you build a forecasting model, the real question isn't just "does it work?"—it's "how wrong is it, and in what ways?" That's where accuracy metrics come in. You're being tested on your ability to select the right metric for different forecasting scenarios, interpret what the numbers actually mean, and explain why one model outperforms another. These metrics connect directly to core concepts like model selection, overfitting, scale sensitivity, and error distribution.

The key insight is that no single metric tells the whole story. Some metrics punish large errors harshly, others express accuracy as percentages for easy interpretation, and still others compare your model against a baseline. Don't just memorize formulas—know what each metric reveals about your forecast's strengths and weaknesses, and when you'd choose one over another.

Absolute Error Metrics

These metrics measure forecast error in the original units of your data, making them intuitive to interpret. They calculate the typical size of your prediction mistakes without worrying about direction—just magnitude.

Mean Absolute Error (MAE)

Averages the absolute differences between predicted and actual values: $MAE = \frac{1}{n}\sum|y_i - \hat{y}_i|$
Treats all errors equally—a miss of 10 units counts the same whether it's one big error or ten small ones
Same units as your data, making it easy to explain to stakeholders ("Our forecasts are off by an average of 50 units")

Mean Squared Error (MSE)

Squares each error before averaging, which heavily penalizes large mistakes: $MSE = \frac{1}{n}\sum(y_i - \hat{y}_i)^2$
Sensitive to outliers—useful when big errors are especially costly and you want to flag models that occasionally miss badly
Units are squared (e.g., dollars² or units²), which complicates direct interpretation

Root Mean Squared Error (RMSE)

Takes the square root of MSE to return to original units: $RMSE = \sqrt{MSE}$
Retains sensitivity to large errors while being interpretable—the go-to metric in regression and time series evaluation
Always ≥ MAE for the same dataset; the gap between them indicates how much variance exists in your errors

Compare: MAE vs. RMSE—both measure error in original units, but RMSE punishes large errors more severely. If your RMSE is much higher than your MAE, you have some big misses hiding in your data. Use MAE when all errors matter equally; use RMSE when large errors are especially problematic.

Percentage-Based Metrics

These metrics express error as a percentage, making them useful for comparing accuracy across datasets with different scales. The tradeoff: they can behave strangely when actual values are near zero.

Mean Absolute Percentage Error (MAPE)

Calculates average percentage error: $MAPE = \frac{100\%}{n}\sum\left|\frac{y_i - \hat{y}_i}{y_i}\right|$
Scale-independent—you can compare forecast accuracy across products, regions, or time periods with different magnitudes
Breaks down near zero—if actual values are small, percentage errors explode and distort your accuracy picture

Symmetric Mean Absolute Percentage Error (SMAPE)

Uses the average of actual and predicted values in the denominator: $SMAPE = \frac{100\%}{n}\sum\frac{|y_i - \hat{y}_i|}{(|y_i| + |\hat{y}_i|)/2}$
Bounded between 0% and 200%, avoiding the infinite values that plague MAPE when actuals are near zero
Symmetric treatment of over- and under-predictions makes it fairer for model comparison across diverse datasets

Compare: MAPE vs. SMAPE—both give percentage-based accuracy, but SMAPE handles near-zero values more gracefully. Choose MAPE for intuitive reporting when values are safely above zero; switch to SMAPE when your data includes small or intermittent values.

Scale-Free and Relative Metrics

These metrics compare your model's performance against a benchmark (usually a naive forecast), answering the question: "Is my model actually adding value, or could I have just used yesterday's value?"

Mean Absolute Scaled Error (MASE)

Compares your errors to a naive forecast's errors: $MASE = \frac{MAE}{MAE_{naive}}$
MASE < 1 means you're beating the naive model; MASE > 1 means you're doing worse than simply predicting the last observed value
Scale-free and works across different time series, making it ideal for comparing forecast accuracy across product lines or datasets

Theil's U Statistic

Ratios your RMSE against a naive forecast's RMSE—values below 1 indicate your model adds predictive value
Decomposes into bias, variance, and covariance components, helping diagnose why your forecast is missing
Useful for model selection when you need to justify that a complex model outperforms simple alternatives

Compare: MASE vs. Theil's U—both benchmark against naive forecasts, but MASE uses absolute errors while Theil's U uses squared errors. MASE is more robust to outliers; Theil's U provides richer diagnostic decomposition. If an exam question asks about relative forecast improvement, either works—but specify which baseline you're using.

Model Fit and Selection Metrics

These metrics evaluate how well your model explains variation in the data and help you choose between competing models. They're essential for avoiding overfitting—building a model that memorizes your training data but fails on new observations.

R-squared (R²)

Measures the proportion of variance explained by your model: $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$
Ranges from 0 to 1 (though it can go negative for terrible models), with higher values indicating better fit
Misleading in isolation—R² always increases when you add predictors, even useless ones, so it can reward overfitting

Adjusted R-squared

Penalizes for additional predictors that don't improve fit: $R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-k-1}$
Can decrease when you add weak predictors, making it a better tool for comparing models with different numbers of variables
Essential for model comparison—if Adjusted R² drops when you add a variable, that variable isn't earning its keep

Compare: R² vs. Adjusted R²—R² rewards complexity blindly, while Adjusted R² asks whether added predictors actually improve the model. Always report Adjusted R² when comparing models with different numbers of predictors; plain R² is fine for single-model interpretation.

Akaike Information Criterion (AIC)

Balances goodness of fit against model complexity: $AIC = 2k - 2\ln(L)$ where $k$ is the number of parameters and $L$ is the likelihood
Lower AIC values indicate better models—it penalizes overfitting by adding a cost for each parameter
Only meaningful for comparison—a single AIC value tells you nothing; compare AIC across candidate models to select the best one

Compare: Adjusted R² vs. AIC—both penalize complexity, but Adjusted R² is easier to interpret (it's still a proportion of variance explained) while AIC is more theoretically grounded for maximum likelihood models. Use Adjusted R² for quick comparisons; use AIC when doing formal model selection in time series or regression contexts.

Quick Reference Table

Concept	Best Examples
Error in original units	MAE, RMSE
Penalizes large errors	MSE, RMSE
Percentage-based accuracy	MAPE, SMAPE
Comparison to naive baseline	MASE, Theil's U
Variance explained	R², Adjusted R²
Model selection with complexity penalty	AIC, Adjusted R²
Robust to near-zero values	SMAPE, MASE
Diagnoses error sources	Theil's U

Self-Check Questions

Your RMSE is significantly higher than your MAE for the same forecast. What does this tell you about your error distribution, and which metric would you report to a risk-averse stakeholder?
You're comparing forecast accuracy for two product lines—one with average sales of 10,000 units and another with average sales of 50 units. Which metric would give you a fair comparison, and what pitfall should you watch for?
A colleague's model has R² = 0.92, but when you calculate MASE, it's 1.15. How do you explain this apparent contradiction, and which metric should guide your model selection?
Compare and contrast AIC and Adjusted R² as tools for preventing overfitting. When would you prefer one over the other?
You need to evaluate whether a new forecasting model is worth implementing over your current simple moving average. Which two metrics would best support your recommendation, and what threshold values would indicate success?