Fiveable

🤖Statistical Prediction Unit 14 Review

QR code for Statistical Prediction practice questions

14.1 Regression Metrics: MSE, RMSE, MAE, and R-squared

14.1 Regression Metrics: MSE, RMSE, MAE, and R-squared

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🤖Statistical Prediction
Unit & Topic Study Guides

Regression metrics help us gauge how well our models predict outcomes. MSE, RMSE, and MAE measure prediction errors, while R-squared shows how much variation our model explains. These tools are crucial for evaluating and comparing regression models.

Understanding these metrics is key to assessing model performance in real-world scenarios. They help us identify which models are most accurate and reliable, guiding us in making better predictions and decisions based on our data.

Error Metrics

Measuring Prediction Errors

  • Mean Squared Error (MSE) calculates the average squared difference between the predicted and actual values
    • Formula: MSE=1ni=1n(yiy^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
    • Squaring the errors amplifies larger errors and minimizes smaller ones
    • Sensitive to outliers due to the squaring of errors
  • Root Mean Squared Error (RMSE) takes the square root of the MSE to bring the units back to the original scale
    • Formula: RMSE=MSE=1ni=1n(yiy^i)2RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
    • Easier to interpret than MSE as it is in the same units as the target variable
    • Still sensitive to outliers, but less so than MSE
  • Mean Absolute Error (MAE) calculates the average absolute difference between the predicted and actual values
    • Formula: MAE=1ni=1nyiy^iMAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
    • Less sensitive to outliers compared to MSE and RMSE
    • Provides a more intuitive understanding of the average error magnitude

Percentage-based Error Metric

  • Mean Absolute Percentage Error (MAPE) expresses the average absolute error as a percentage of the actual values
    • Formula: MAPE=100%ni=1nyiy^iyiMAPE = \frac{100\%}{n} \sum_{i=1}^{n} |\frac{y_i - \hat{y}_i}{y_i}|
    • Useful when the target variable has a wide range of values or when comparing models across different datasets
    • Can be misleading when actual values are close to zero, as it can lead to large percentage errors
Measuring Prediction Errors, How to normalize the RMSE

Analyzing Model Residuals

  • Residuals represent the differences between the predicted and actual values
    • Formula: residuali=yiy^iresidual_i = y_i - \hat{y}_i
    • Positive residuals indicate underestimation, while negative residuals indicate overestimation
    • Analyzing residuals helps assess model assumptions and identify patterns or biases in the predictions
    • Residual plots (residuals vs. predicted values) can reveal non-linear relationships or heteroscedasticity

Coefficient of Determination

Measuring Prediction Errors, least squares - Mean absolute error OR root mean squared error? - Cross Validated

Measuring Model Fit

  • R-squared (Coefficient of Determination) measures the proportion of variance in the target variable explained by the model
    • Formula: R2=1SSresSStot=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}
    • Ranges from 0 to 1, with higher values indicating a better fit
    • Represents the improvement of the model compared to using the mean of the target variable as a prediction
    • Can be interpreted as the percentage of variance explained by the model (e.g., R-squared of 0.75 means 75% of the variance is explained)
  • Adjusted R-squared penalizes the addition of unnecessary predictors to the model
    • Formula: AdjustedR2=1(1R2)(n1)np1Adjusted R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}, where pp is the number of predictors
    • Useful for comparing models with different numbers of predictors
    • Prevents overfitting by discouraging the inclusion of irrelevant variables

Assessing Model Goodness of Fit

  • Goodness of fit refers to how well the model fits the observed data
    • A high R-squared or adjusted R-squared indicates a good fit, meaning the model captures a significant portion of the variability in the target variable
    • However, a high R-squared does not necessarily imply a good model, as it can be affected by outliers or overfitting
    • It is important to consider other diagnostic measures (residual plots, cross-validation) alongside R-squared to assess model performance and validity
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →