Fiveable

📊AP Statistics Unit 2 Review

QR code for AP Statistics practice questions

2.7 Residuals

2.7 Residuals

Written by the Fiveable Content Team • Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examWritten by the Fiveable Content Team • Last updated June 2026
📊AP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

A residual is the difference between an actual value and a predicted value: residual = y - ŷ. You use residuals and residual plots to check whether a linear model is a good fit, because a residual plot with no clear pattern is evidence that the relationship is linear.

How to Find a Residual

To find a residual in AP Stats, use the formula residual = actual y-value - predicted y-value, or residual = y - ŷ. First plug the x-value into the regression equation to get ŷ. Then subtract that prediction from the observed y-value.

The sign matters. A positive residual means the actual point is above the regression line, so the model underestimated. A negative residual means the actual point is below the line, so the model overestimated.

Why This Matters for the AP Statistics Exam

Residuals connect a regression line to how well it actually fits the data. On the AP Statistics exam, you may be asked to calculate a residual for a single point, read a residual plot, or decide whether a linear model is appropriate. These skills show up in multiple-choice questions and in free-response questions about regression, where you often need to interpret a plot and justify your answer in context.

This topic also sets up later regression work. Once you can read residual plots, you are ready for least-squares regression in the next topic and for fixing nonlinear patterns with transformations later in the unit.

Key Takeaways

  • A residual is actual minus predicted: residual = y - ŷ.
  • A positive residual means the model underestimated (actual is above the line); a negative residual means the model overestimated (actual is below the line).
  • A residual plot graphs residuals against the explanatory variable or the predicted values.
  • Random scatter with no pattern in a residual plot is evidence that a linear model is appropriate.
  • A clear pattern, like a curve, is evidence that a linear model is not the best fit.
  • To find a residual, use the regression equation to get ŷ, then subtract it from the actual y.

What Is a Residual?

A residual measures how far off a prediction is for a single data point. It is the difference between the observed value of the response variable (y) and the predicted value (ŷ) from the model:

residual = y - ŷ

The least-squares line is built to minimize the sum of the squared residuals. For each point, the residual is the vertical distance between the point and the line. A small residual means the model predicts that point well. A large residual means the prediction is off for that point.

Thinking in terms of sign helps:

  • Positive residual: the actual value is greater than the predicted value, so the model underestimates the true value.
  • Negative residual: the actual value is less than the predicted value, so the model overestimates the true value.

Residual Plots

A residual plot graphs the residuals on the vertical axis and the explanatory variable (or the predicted values) on the horizontal axis.

If the residual plot shows apparent randomness, that is evidence the relationship between the explanatory and response variables is linear. In other words, the model is capturing the underlying pattern in the data.

"Apparent randomness" means the residuals are scattered around the horizontal line at 0 with no systematic pattern. When that happens, the residuals are not tied to the explanatory variable in a predictable way, which suggests a good linear fit.

Example 1

Here the linear model fits the data fairly well. The residual plot shows no apparent pattern, and the points are scattered fairly evenly above and below the line at 0.

Example 2

Here the data follows a curved pattern, not the straight line. The residual plot shows a clear curved pattern instead of random scatter. That curve is a signal that a linear model is not the best choice. You will see how to adjust for this and build a better model later in the unit.

Reading a Residual Plot: Good Fit or Not?

To decide whether a linear model is a good fit, look at the residual plot:

  • Good fit: residuals are randomly scattered with no clear pattern.
  • Poor fit: residuals show a clear shape, like a curve. A curved residual plot means a linear model is not appropriate and a nonlinear model would work better.

Calculating Residuals

To calculate a residual, you need the least-squares regression line (LSRL) and the data point you care about.

  1. Plug the x-value into the LSRL to get the predicted value ŷ.
  2. Subtract the predicted value from the actual value.

The formula is:

residual = (actual) - (predicted)

Example 1

A LSRL for the predicted amount of Lucky Charms eaten based on age in years is:

ŷ = 150.5x - 2.34

A 50-year-old in the data set ate 7,500 Lucky Charms. Find the residual.

ŷ = 150.5(50) - 2.34

ŷ = 7522.66

residual = 7500 - 7522.66 = -22.66

The negative residual means the model overestimated this person's count.

You may also be asked to find the actual value or the predicted value when you are given the residual. That uses the same formula, just rearranged to solve for the piece you are missing.

Example 2

A researcher studies the relationship between hours spent studying for an exam and the score received. She collects data from 50 students and fits a linear regression model. A residual plot is provided.

a) Describe the pattern, if any, in the residual plot.

b) Explain what the pattern suggests about the fit of the model.

c) If the model is not fitting the data well, suggest one potential reason why.

d) Propose one potential solution to improve the fit of the model.

e) Explain how that solution would address the issue.

Answers

a) The residual plot shows a curved pattern.

b) The curved pattern suggests the model does not fit well. The residuals are not randomly scattered around the horizontal axis, which means there is a systematic relationship between the variables that the linear model is not capturing.

c) The relationship between hours studied and exam score may not be linear. Some other underlying form may better describe how the variables are connected.

d) You could transform the data, for example by taking the logarithm of hours studied or of exam score. (You will work with transformations later in the unit.)

e) A transformation can reshape the data so the relationship becomes closer to linear. That uncovers a more appropriate form, which leads to a better model fit and more random scatter in the residual plot.

How to Use This on the AP Statistics Exam

MCQ

  • Be ready to compute a residual quickly: find ŷ from the equation, then do actual minus predicted.
  • Watch the sign. A positive residual means the actual point is above the line; a negative residual means it is below.
  • Given a residual plot, identify whether the scatter is random (linear model fits) or patterned (linear model does not fit).

Free Response

  • When a question gives a regression equation, show the substitution into the LSRL and the subtraction so your work is clear.
  • When asked to assess a model, describe the residual plot specifically. Say whether you see random scatter or a clear pattern, and connect that to whether a linear model is appropriate.
  • Interpret in context. Instead of just "the residual is -22.66," say the model overestimated the actual value for that case.

Common Trap

  • Do not say a linear model is appropriate just because the original scatterplot "looks linear" or because r is close to 1 or -1. The residual plot is what gives evidence about linear form.

Common Misconceptions

  • A residual is not the same as an error in the everyday sense. It is simply actual minus predicted for a point on your data.
  • A positive residual does not mean a "good" prediction and a negative residual a "bad" one. The sign tells you whether the actual value was above or below the line, not whether the model is good overall.
  • A residual plot is not the same as a scatterplot. It plots residuals, not the original y-values, against x or against ŷ.
  • Random scatter in a residual plot is the good sign. A clear pattern, such as a curve or a fan shape, is what signals a problem with the linear model.
  • A high correlation alone does not prove a linear model fits. You still check the residual plot before trusting a linear model.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

actual value

The observed or measured response value in a dataset, denoted as y.

bivariate data

Data involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them.

form of association

The pattern or type of relationship between two variables, such as linear, curved, or no relationship.

linear model

A mathematical representation of the linear relationship between two variables.

predicted value

The estimated response value obtained from a regression model, denoted as ŷ.

randomness in residuals

The absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data.

residual

The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.

residual plot

A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.

Frequently Asked Questions

How do you find a residual in AP Stats?

To find a residual, subtract the predicted value from the actual value: residual = y - ŷ. Use the regression equation to calculate ŷ first, then do actual minus predicted.

What does a positive residual mean?

A positive residual means the actual y-value is greater than the predicted y-value. On a scatterplot, the point is above the regression line, so the model underestimated that value.

What does a negative residual mean?

A negative residual means the actual y-value is less than the predicted y-value. On a scatterplot, the point is below the regression line, so the model overestimated that value.

How do you interpret a residual in context?

State the size, direction, and context of the prediction error. For example, a residual of -4 minutes means the model predicted 4 minutes more than the actual time for that observation.

What does a residual plot show?

A residual plot graphs residuals against the explanatory variable or predicted values. It helps you decide whether a linear model is appropriate for the data.

When is a residual plot evidence for a linear model?

A residual plot supports a linear model when the residuals show apparent randomness around 0 with no clear pattern. A curved or systematic pattern is evidence that a linear model is not appropriate.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly→ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot