Question 1

How do I calculate residuals using the formula y - ŷ?

Accepted Answer

Take each observed y and subtract its predicted ŷ from your regression line: residual = y − ŷ. Steps: (1) plug the x into the LSRL ŷ = a + b x to get ŷ, (2) compute residual = actual y − ŷ, (3) keep sign: positive residual means the point lies above the line (y higher than predicted), negative means below. Example: if ŷ = 12.3 and y = 15.0, residual = 15.0 − 12.3 = 2.7. On the AP, you’ll use residuals to make a residual plot (residuals vs x or vs ŷ) to check linearity and constant variance—random scatter supports a linear model (CED DAT-1.E and DAT-1.F). Remember the mean of residuals for an LSRL is 0. Want more practice and review? See the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and hit the AP practice questions (https://library.fiveable.me/practice/ap-statistics).

Question 2

What's the difference between actual value and predicted value in residuals?

Accepted Answer

The actual value is the observed response y (what you measured). The predicted value ŷ is what the regression model (least squares line) estimates for that x. A residual = y − ŷ measures that difference: positive residual → the model underpredicted (point lies above the line); negative residual → the model overpredicted (point lies below the line). Residual plots (residuals vs x or vs ŷ) help you check model assumptions: if residuals show random scatter (mean ~0) the linear form is appropriate; patterns or fan shapes indicate nonlinearity or heteroscedasticity. On the AP exam you may need to compute and interpret residuals (DAT-1.E: residual = y − ŷ) and use residual plots to judge linearity (DAT-1.F). For a focused review, check the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 3

When do I use residual plots vs regular scatter plots?

Accepted Answer

Use a regular scatterplot when you want to describe the raw relationship between two quantitative variables—direction, form, strength, and any obvious outliers. Use a residual plot when you’ve fit a model (often the least-squares line) and need to check whether that model is appropriate. Quick checklist (AP style): - Scatterplot: shows y vs x; good for seeing linear vs nonlinear association and estimating correlation (DAT-1.H, DAT-1.F). - Residual plot: plot residual = y − ŷ versus x (or versus ŷ). You’re looking for random scatter centered near 0. If residuals show no pattern, that supports a linear model (DAT-1.E.2, DAT-1.F.1). If you see curvature (pattern) or fan shape (heteroscedasticity) the linear model is inappropriate (DAT-1.F.2). Also use residual plots to spot outliers/influential points. Remember residuals should average about 0; any systematic pattern means revise the model. For the CED-aligned guide on residuals see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF). For more review and practice, check the Unit 2 overview (https://library.fiveable.me/ap-statistics/unit-2) and lots of practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 4

Can someone explain step by step how to make a residual plot?

Accepted Answer

Step-by-step: 1) Fit the least-squares regression line (LSRL) for your data (ŷ = a + bx). Use your calculator or software (AP exam allows a graphing calculator). 2) For each point, compute the residual: residual = y − ŷ. (CED DAT-1.E.1) 3) Make a scatterplot with the explanatory variable x (or the predicted values ŷ) on the horizontal axis and the residuals on the vertical axis. (CED DAT-1.E.2) 4) Add a horizontal line at residual = 0. Check the mean of residuals ≈ 0 (a property of LSRL). 5) Interpret: random scatter around 0 supports a linear model (DAT-1.F.1). Patterns (curves) show nonlinearity; funnel shapes show heteroscedasticity; isolated big residuals are outliers/influential. (DAT-1.F.2 + keywords) For quick review, see the Topic 2.7 study guide (Fiveable) (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF). For more practice, try problems at (https://library.fiveable.me/practice/ap-statistics).

Question 5

I don't understand what ŷ means in the residual formula - is that y-hat?

Accepted Answer

Yes—ŷ is “y-hat,” the predicted y from your model. Residual = y − ŷ (actual minus predicted). If your least-squares line is ŷ = a + b x, you plug in the x to get ŷ for that case, then subtract from the observed y to get the residual. Residuals tell you how far each point is from the model: positive residual → model underestimates (point above line); negative residual → model overestimates (point below line). On the AP CED, residuals and residual plots (DAT-1.E, DAT-1.F) are used to check model fit—you want random scatter (no pattern) and roughly constant spread (no heteroscedasticity); mean of residuals for a least-squares line is about 0. For a quick refresher, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF). For more unit review or practice, check the Unit 2 page (https://library.fiveable.me/ap-statistics/unit-2) and the AP practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 6

How do I know if my linear model is appropriate using residuals?

Accepted Answer

Compute residuals (residual = y − ŷ) and make a residual plot (residuals vs explanatory variable or vs predicted ŷ). For a linear model to be appropriate you want: residuals scattered randomly around zero (no clear curve), roughly constant spread (homoscedasticity—no funnel shape), and no lone outliers or influential points. The least-squares line gives residuals with mean ≈ 0, so systematic patterns (U-shape, increasing/decreasing trend) mean the linear form is wrong. If spread changes with x (heteroscedasticity) prediction SEs will be off. Flag any big residuals for further check—they can be outliers or influential. On the AP exam you’ll use residual plots to justify whether a linear model is appropriate (CED DAT-1.E / DAT-1.F). Want practice? See the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try problems at (https://library.fiveable.me/practice/ap-statistics).

Question 7

What does it mean when residuals look random on a plot?

Accepted Answer

A random-looking residual plot means your linear model is doing a good job of capturing the trend. Residuals are y − ŷ, and when you plot them (versus x or ŷ) you want to see no systematic pattern—just scatter centered near zero with roughly equal spread. That randomness supports a linear form (CED DAT-1.F.1), suggests homoscedasticity (constant variance), and means no obvious nonlinearity, omitted variable pattern, or cluster of influential points. If you see a curve, fan shape, or big outliers instead, the linear model is inappropriate (use a nonlinear model, transform, or investigate outliers). Residual checks are exactly how you verify the linearity assumption before doing slope inference on the AP exam (Topic 2.7, DAT-1.E/F). For a quick review, see the topic study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF).

Question 8

How do I find predicted values to calculate residuals?

Accepted Answer

Start with the regression equation (least-squares line) ŷ = a + bx. To find a predicted value ŷ for a given explanatory x, plug that x into the equation and compute ŷ. Then compute the residual: residual = y − ŷ (observed minus predicted). Quick example: if ŷ = 10 + 2x and for one case x = 3 with observed y = 19, then ŷ = 10 + 2(3) = 16 and residual = 19 − 16 = 3. On the AP: you’ll usually get the regression equation from a calculator or problem statement (use your graphing calculator on the exam). Residuals are plotted versus x or ŷ to check model fit—randomness in that plot supports linearity; patterns or fan-shape signal nonlinearity or heteroscedasticity (CED DAT-1.E and DAT-1.F). For a quick review, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 9

I'm confused about residual plots - do I plot against x values or predicted values?

Accepted Answer

Either is okay—you can plot residuals (y − ŷ) against the original explanatory variable x or against the predicted values ŷ. The CED explicitly allows both: “residual plots are a plot of residuals versus explanatory variable values or predicted response values” (DAT-1.E.2). Which to pick: - Use residual vs x when you want to see whether the relationship between x and y is linear (look for no pattern). - Use residual vs ŷ when you want to check model fit and heteroscedasticity (patterns or “fanning” with ŷ show nonlinearity or unequal spread). What to look for: residuals should scatter randomly around 0 (mean of residuals ≈ 0). Any systematic curve or changing spread means the linear model may be inappropriate (DAT-1.F.1 and DAT-1.F.2). For quick AP review, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try practice problems at (https://library.fiveable.me/practice/ap-statistics).

Question 10

What's the formula for calculating each residual point?

Accepted Answer

Each residual is just the observed y minus the predicted ŷ from your fitted line: residual = y − ŷ. If your least-squares line is ŷ = a + b x, compute ŷ for that x, then subtract from the actual y to get the residual for that point. Residuals are plotted (residual vs. x or residual vs. ŷ) to check linearity and equal spread (homoscedasticity). For least-squares regression the residuals sum (and mean) is about 0, so look for random scatter around zero—patterns suggest nonlinearity or problems. This matches the CED (DAT-1.E.1, DAT-1.E.2, DAT-1.F)—see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try practice problems at (https://library.fiveable.me/practice/ap-statistics) to get comfortable calculating and interpreting residuals.

Question 11

When residuals show a pattern instead of randomness, what does that tell me about my model?

Accepted Answer

If residuals show a clear pattern (curve, fan shape, clump, or trend) instead of random scatter around 0, that tells you the chosen model isn’t appropriate for the data. Specifically, for a linear least-squares model this violates the linearity assumption (CED DAT-1.F.1): the relationship between x and y is likely nonlinear or the variability changes with x (heteroscedasticity). It also means your predictions ŷ are biased in some x-range and inference about the slope may be invalid. Check for: a curved pattern → try a nonlinear model or transform x or y; a funnel/fan pattern → consider transforming to stabilize variance; one big residual → check for an outlier or influential point and whether it’s a data error or a special case. For AP exam framing, use a residual plot (residual = y − ŷ; DAT-1.E) to assess model appropriateness (DAT-1.F.2). Need practice interpreting residual plots? See the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF), the Unit 2 overview (https://library.fiveable.me/ap-statistics/unit-2), and more practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 12

How do I interpret a residual plot that curves upward?

Accepted Answer

If the residual plot curves upward, that’s a red flag the linear model is missing a curved pattern. Remember residual = y − ŷ, and a good linear fit gives residuals scattered randomly around 0 (DAT-1.E/ DAT-1.F). A clear upward curve means residuals are systematically negative for some x-values and positive for others—the linearity assumption is violated (nonlinearity). What to do: try a curved model (add a quadratic term or use a log/other transform), or fit a nonlinear regression and compare fit. Also check for patterns in spread (heteroscedasticity) or outliers that could affect the shape. On the AP exam, you’d cite the patterned residuals as evidence the linear model is inappropriate (use residual plot to justify changing models—DAT-1.F.2). For a quick review, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

Question 13

Can you walk me through creating a residual plot from a data table?

Accepted Answer

Start with your data table of x (explanatory) and y (observed response). Steps: 1. Fit the least-squares line (use calculator or software) to get ŷ = a + bx and record ŷ for each row. (On the AP exam you can use a graphing calculator.) 2. For each observation compute the residual: residual = y − ŷ. (CED DAT-1.E.1) 3. Make a scatterplot with the explanatory variable x on the horizontal axis and the residual on the vertical axis (you may also plot residuals vs ŷ). (CED DAT-1.E.2) 4. Add a horizontal reference line at 0. Check the pattern: random scatter around 0 supports linearity/homoscedasticity; curved pattern suggests nonlinearity; funnel shape suggests heteroscedasticity; big isolated points are outliers/influential. (CED DAT-1.F.1–F.2) 5. Note that the mean of residuals for a least-squares fit is (approximately) 0. If you want guided practice or examples, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF). For more unit review or 1000+ practice problems, check the Unit 2 page (https://library.fiveable.me/ap-statistics/unit-2) and practice hub (https://library.fiveable.me/practice/ap-statistics).

Question 14

Why do we subtract predicted from actual instead of actual from predicted for residuals?

Accepted Answer

We define residual = actual − predicted (y − ŷ) because that sign makes the interpretation intuitive and consistent: a positive residual means the observed value is higher than the model predicted, and a negative residual means it’s lower. That convention is used in the CED (DAT-1.E.1) and in residual plots (DAT-1.E.2) so you can quickly read whether points lie above or below the fitted line. It also matters algebraically: least-squares regression chooses slope and intercept to minimize the sum of squared residuals Σ(y − ŷ)^2, and for those fitted values the residuals sum to (approximately) zero—another reason the y − ŷ form is standard. Residual plots (y − ŷ vs x or ŷ) help check linearity, constant variance, and outliers (DAT-1.F). For a clear review, check the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try practice problems at (https://library.fiveable.me/practice/ap-statistics).

Question 15

What does apparent randomness in residuals actually look like on a graph?

Accepted Answer

Apparent randomness in a residual plot means the points look like a messy cloud with no clear pattern. You’d see residuals (y − ŷ) scattered above and below 0 roughly equally, with most points close to the horizontal line at 0 and no systematic curve, line, or cone shape. Key signs to check (CED language): mean of residuals ≈ 0, no nonlinearity, and roughly constant spread (homoscedasticity). What to watch for instead: a curved pattern (suggests nonlinearity), a funnel (spread increasing or decreasing → heteroscedasticity), clusters, or a few large residuals/outliers (possible influential points). If the residual plot shows apparent randomness, that supports a linear form for the association (DAT-1.F.1) and the chosen linear model is appropriate. For quick practice and more examples of residual plots, see the Topic 2.7 study guide (https://library.fiveable.me/ap-statistics/unit-2/residuals/study-guide/zdTJQZw0UVGswyK6kkEF) and try problems at (https://library.fiveable.me/practice/ap-statistics).

Term	Definition
actual value	The observed or measured response value in a dataset, denoted as y.
bivariate data	Data involving two variables, typically represented as ordered pairs (x, y) to examine the relationship between them.
form of association	The pattern or type of relationship between two variables, such as linear, curved, or no relationship.
linear model	A mathematical representation of the linear relationship between two variables.
predicted value	The estimated response value obtained from a regression model, denoted as ŷ.
randomness in residuals	The absence of a clear pattern in a residual plot, indicating that a linear model is appropriate for the data.
residual	The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.
residual plot	A scatter plot that displays residuals on the vertical axis versus either the explanatory variable values or predicted response values on the horizontal axis, used to assess the fit of a regression model.

📊AP Statistics Unit 2 Review

2.7 Residuals

📊AP Statistics
Unit 2 Review

2.7 Residuals

Unit & Topic Study Guides

Residual Plots

Example 1

Example 2

Good or Bad?

Calculating Residuals

Example 1

Example 2

Answers

Vocabulary

Frequently Asked Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes

Study Content & Tools

Company

Resources