Prediction Using Least-Squares Regression Line
The least-squares regression line lets you take a known value of one variable and predict the value of another. This section covers how to make those predictions, how to interpret them, and when it's appropriate (or not) to use them.

The Regression Equation
The least-squares regression line takes the form:
- is the predicted value of the response variable
- is the y-intercept (the predicted value of when )
- is the slope (the predicted change in for each one-unit increase in )
- is the value of the explanatory variable you're plugging in
To make a prediction, substitute your known value into the equation and solve for .

Interpreting Predicted Values
The predicted value is your best estimate of the response variable for a given , based on the model. It won't usually match the actual observed value exactly.
Example: Suppose the regression equation for predicting final exam scores from midterm scores is . If a student scored 80 on the midterm:
You'd predict a final exam score of 81 for a student who earned an 80 on the midterm.
The difference between an actual value and its predicted value is called a residual:
If that student actually scored 85 on the final, the residual is . A positive residual means the model underpredicted; a negative residual means it overpredicted.
The standard error of the estimate (often written ) summarizes how far actual values tend to fall from the regression line on average. A smaller means the predictions are generally more precise.

When Regression Predictions Are Appropriate
Not every regression equation should be used for prediction. Before relying on a prediction, check these conditions:
- Linearity — The relationship between and should be roughly linear. Check the scatter plot and the residual plot for any curved patterns.
- Normally distributed residuals — The residuals should be approximately normal with a mean of zero. A histogram or normal probability plot of the residuals can help you verify this.
- Constant variability (homoscedasticity) — The spread of residuals should stay roughly the same across all values of . If the residual plot fans out or funnels in, this condition is violated.
- No influential outliers — Points with high leverage or large residuals can distort the regression line and make predictions unreliable.
Beyond these conditions, two more considerations matter:
- Stay within the range of the data. Predictions should only be made for values within (or very close to) the range of the original data. Predicting outside that range is called extrapolation, and it's risky because you have no evidence the linear pattern continues.
- The model should fit well. A higher value means the regression line explains more of the variability in , which generally makes predictions more trustworthy. A low means the line isn't capturing much of the pattern, so predictions will be imprecise even if all conditions are met.
Additional Regression Analysis Tools
- Scatter plot — Plot the raw data to visually assess whether the relationship looks linear and to spot potential outliers.
- Confidence interval for the mean response — Gives a range of plausible values for the average at a particular . This interval is narrower because it targets the population mean, not a single observation.
- Prediction interval for an individual observation — Gives a range of plausible values for a single new at a particular . This interval is always wider than the confidence interval because individual observations carry extra variability beyond the mean.
Prediction intervals are wider than confidence intervals at the same value. The confidence interval accounts for uncertainty in estimating the mean, while the prediction interval also accounts for the natural scatter of individual data points around that mean.