Fiveable

📊Honors Statistics Unit 12 Review

QR code for Honors Statistics practice questions

12.4 Prediction (Optional)

12.4 Prediction (Optional)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Prediction Using Least-Squares Regression Line

The least-squares regression line lets you take a known value of one variable and predict the value of another. This section covers how to make those predictions, how to interpret them, and when it's appropriate (or not) to use them.

Pep mascot
more resources to help you study

The Regression Equation

The least-squares regression line takes the form:

y^=b0+b1x\hat{y} = b_0 + b_1 x

  • y^\hat{y} is the predicted value of the response variable
  • b0b_0 is the y-intercept (the predicted value of yy when x=0x = 0)
  • b1b_1 is the slope (the predicted change in yy for each one-unit increase in xx)
  • xx is the value of the explanatory variable you're plugging in

To make a prediction, substitute your known xx value into the equation and solve for y^\hat{y}.

Least-squares regression line equation, Least Squares/Rank Regression Equations - ReliaWiki

Interpreting Predicted Values

The predicted value y^\hat{y} is your best estimate of the response variable for a given xx, based on the model. It won't usually match the actual observed value exactly.

Example: Suppose the regression equation for predicting final exam scores from midterm scores is y^=25+0.7x\hat{y} = 25 + 0.7x. If a student scored 80 on the midterm:

y^=25+0.7(80)=25+56=81\hat{y} = 25 + 0.7(80) = 25 + 56 = 81

You'd predict a final exam score of 81 for a student who earned an 80 on the midterm.

The difference between an actual value and its predicted value is called a residual:

residual=yy^\text{residual} = y - \hat{y}

If that student actually scored 85 on the final, the residual is 8581=485 - 81 = 4. A positive residual means the model underpredicted; a negative residual means it overpredicted.

The standard error of the estimate (often written ses_e) summarizes how far actual values tend to fall from the regression line on average. A smaller ses_e means the predictions are generally more precise.

Least-squares regression line equation, The Regression Equation | Introduction to Statistics – Gravina

When Regression Predictions Are Appropriate

Not every regression equation should be used for prediction. Before relying on a prediction, check these conditions:

  1. Linearity — The relationship between xx and yy should be roughly linear. Check the scatter plot and the residual plot for any curved patterns.
  2. Normally distributed residuals — The residuals should be approximately normal with a mean of zero. A histogram or normal probability plot of the residuals can help you verify this.
  3. Constant variability (homoscedasticity) — The spread of residuals should stay roughly the same across all values of xx. If the residual plot fans out or funnels in, this condition is violated.
  4. No influential outliers — Points with high leverage or large residuals can distort the regression line and make predictions unreliable.

Beyond these conditions, two more considerations matter:

  • Stay within the range of the data. Predictions should only be made for xx values within (or very close to) the range of the original data. Predicting outside that range is called extrapolation, and it's risky because you have no evidence the linear pattern continues.
  • The model should fit well. A higher R2R^2 value means the regression line explains more of the variability in yy, which generally makes predictions more trustworthy. A low R2R^2 means the line isn't capturing much of the pattern, so predictions will be imprecise even if all conditions are met.

Additional Regression Analysis Tools

  • Scatter plot — Plot the raw data to visually assess whether the relationship looks linear and to spot potential outliers.
  • Confidence interval for the mean response — Gives a range of plausible values for the average yy at a particular xx. This interval is narrower because it targets the population mean, not a single observation.
  • Prediction interval for an individual observation — Gives a range of plausible values for a single new yy at a particular xx. This interval is always wider than the confidence interval because individual observations carry extra variability beyond the mean.

Prediction intervals are wider than confidence intervals at the same xx value. The confidence interval accounts for uncertainty in estimating the mean, while the prediction interval also accounts for the natural scatter of individual data points around that mean.