Fiveable

🎲Intro to Statistics Unit 12 Review

QR code for Intro to Statistics practice questions

12.5 Prediction

12.5 Prediction

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Prediction Using Least-Squares Regression

Least-squares regression lets you use a linear equation to predict one variable from another. For example, you could predict a student's final exam score based on their midterm score or hours studied. The key is knowing when these predictions are trustworthy and how to interpret them correctly.

Least-Squares Regression for Predictions

The regression line takes the form:

y^=b0+b1x\hat{y} = b_0 + b_1x

Each piece of this equation has a specific meaning:

  • y^\hat{y} is the predicted value of the response variable (the "hat" symbol tells you it's a prediction, not an observed value)
  • b0b_0 is the y-intercept, the predicted value of y^\hat{y} when x=0x = 0
  • b1b_1 is the slope, the predicted change in yy for every one-unit increase in xx
  • xx is the value of the explanatory variable you're plugging in

To make a prediction, substitute your xx value into the equation and solve. If your regression equation is y^=10+0.9x\hat{y} = 10 + 0.9x and a student scored 80 on the midterm, the predicted final exam score is y^=10+0.9(80)=82\hat{y} = 10 + 0.9(80) = 82.

Least-squares regression for predictions, 12.3: The Regression Equation | Introduction to Statistics

Interpretation of Predicted Values

The predicted value y^\hat{y} represents the average outcome you'd expect for a given xx. A student with an 80 on the midterm isn't guaranteed an 82 on the final. That's just the best estimate based on the overall pattern in the data.

Interpreting the y-intercept (b0b_0): This is the predicted response when x=0x = 0. Sometimes this makes sense, and sometimes it doesn't. If xx is "hours studied," then b0b_0 would be the predicted score for someone who studied zero hours, which is at least plausible. But if xx is midterm score, a midterm score of zero is so far outside typical data that the intercept has no practical meaning. Always ask yourself: does x=0x = 0 fall within the range of the data?

Interpreting the slope (b1b_1): The slope tells you the direction and rate of the relationship. If b1=2b_1 = 2 and the explanatory variable is hours studied, then for each additional hour studied, the predicted exam score increases by 2 points. Ten more hours would correspond to a predicted increase of 20 points.

Least-squares regression for predictions, Linear Regression (2 of 4) | Concepts in Statistics

Appropriate Use of Regression Equations

Not every prediction from a regression equation is a good one. Here are the main things to watch for:

Stay within the data range. Only use the equation to predict for xx values within (or very close to) the range of your original data. Predicting outside that range is called extrapolation, and it's risky because you have no evidence the linear pattern continues. If your data covers students who studied 1 to 15 hours, predicting for someone who studied 50 hours is extrapolation.

Verify the relationship is actually linear. The regression equation assumes a straight-line relationship. If the scatter plot shows a curve, a linear model will give inaccurate predictions no matter how carefully you calculate.

Check the model assumptions. Before trusting predictions, verify these four conditions (sometimes remembered as LINE):

  1. Linearity of the relationship
  2. Independence of observations
  3. Normality of residuals
  4. Equal variance of residuals (residuals spread evenly across all xx values)

If any of these are seriously violated, the predictions may be unreliable.

Consider the strength of the correlation. The correlation coefficient rr (and the coefficient of determination r2r^2) tell you how closely the data follow the linear pattern. A weak correlation (low r|r|) means the model doesn't explain much of the variation in yy, so predictions will be imprecise.

Watch for influential outliers. A single extreme point can pull the regression line toward it, shifting both the slope and intercept. This can distort predictions for everyone else in the dataset.

Visualizing and Assessing Predictions

Always start by looking at a scatter plot. It shows you whether the relationship looks linear, whether there are outliers, and whether the spread of points is roughly even across the range of xx.

The standard error of the estimate (sometimes written ses_e) measures the typical distance between observed yy values and the regression line. A smaller standard error means your predictions tend to land closer to the actual values.

You can also build prediction intervals around a specific predicted value. These give a range of plausible values for an individual observation at a given xx. Prediction intervals are wider than confidence intervals for the mean response because individual outcomes vary more than averages do.