The least-squares regression line (LSRL) is the line ŷ = a + bx that best fits bivariate quantitative data by minimizing the sum of squared residuals (vertical distances between observed and predicted y-values), and its slope b serves as the point estimate for the population slope β in AP Stats Unit 9 inference.
The least-squares regression line (LSRL) is the straight line that fits a scatterplot of two quantitative variables better than any other line, where "better" has a specific meaning. It minimizes the sum of the squared residuals, the vertical gaps between each observed y-value and the y-value the line predicts. You write it as ŷ = a + bx, where a is the y-intercept, b is the slope, and the hat on ŷ reminds you it's a predicted value, not an observed one.
Here's the upgrade that happens in Unit 9. The LSRL you compute from a sample is an estimate of something bigger. Per the CED, the sample regression line ŷ = a + bx estimates the population regression line μy = α + βx. Your slope b is just one sample's guess at the true slope β, and a different sample would give a slightly different b. That sampling variability is exactly why Unit 9 builds confidence intervals and tests around the slope. Each residual, yi − ŷi, is itself an estimate of how far that observation sits from the population line.
The LSRL is the backbone of Topic 9.2, Confidence Intervals for the Slope of a Regression Model, and it shows up in every learning objective there. AP Stats 9.2.A asks you to recognize the sample line as an estimate of the population line. AP Stats 9.2.B has you verify conditions (linearity, equal standard deviation across x, independence, approximate normality of responses), and residual analysis from the LSRL is how you check the first two. AP Stats 9.2.C and 9.2.D have you build the interval b ± t*(SEb), where SE = s/(sx√(n−1)) and s estimates σ, the standard deviation of deviations from the population line. In other words, you can't do slope inference without first understanding what the LSRL is and where its slope comes from. It's also the bridge concept connecting descriptive regression earlier in the course to formal inference at the end.
Keep studying AP Statistics Unit AFBX4wS5WshWY7O9
Slope (Unit 9)
The slope b of the LSRL is the point estimate for the population slope β. Every confidence interval in Topic 9.2 is built as b ± t*(SEb), so the entire inference procedure starts with the line's slope.
Residuals (Unit 9)
Residuals are the leftovers the LSRL couldn't explain, and the LSRL is literally defined by making their squared sum as small as possible. You also reuse residuals as a diagnostic tool. A residual plot with no curve and roughly even spread verifies the linearity and constant standard deviation conditions in 9.2.B.
Confidence Interval (Unit 9)
A confidence interval for slope answers the question the LSRL alone can't. Your sample gave you b, but how far might the true β be from it? The t-interval with n − 2 degrees of freedom turns one sample's line into a range of plausible population slopes.
Coefficient of Determination (R²) (Unit 9)
R² tells you how much of the variation in y the LSRL actually explains. A line always exists, even for terrible data, so R² is your reality check on whether the fit is worth trusting before you bother with inference.
Question 1 on the AP Stats free response frequently hands you computer regression output for an LSRL and asks you to work with it. The 2017 FRQ used wolf length and weight, 2018 used checkout times in a grocery line, 2022 used bullfrog length and mass, and the 2023 exam used tule elk weights. Typical tasks include interpreting the slope in context, predicting a y-value, finding or interpreting a residual, and commenting on fit. Multiple-choice questions often give you an equation like ŷ = 62 + 4.5x with s = 6.8 and n = 25, then ask you to construct or interpret a confidence interval for the slope, interpret the slope itself, or work with the standard error. The big point losses come from sloppy language. Always say "predicted" y, interpret the slope as an average change in y per one-unit increase in x, and keep context (units and variable names) in every interpretation.
The LSRL (ŷ = a + bx) is computed from your sample and changes from sample to sample. The population regression line (μy = α + βx) describes the true relationship for everyone, and you never actually see it. The whole point of Topic 9.2 is using the sample line's slope b, plus its standard error, to estimate the unseen β. Mixing up b and β (or ŷ and μy) in an interpretation costs credit on FRQs.
The LSRL is the line ŷ = a + bx that minimizes the sum of squared vertical residuals, not perpendicular distances and not raw distances.
The sample slope b is the point estimate for the population slope β, and the confidence interval is b ± t*(SEb) with n − 2 degrees of freedom.
The standard error of the slope is SE = s/(sx√(n−1)), where s estimates the standard deviation of deviations from the population regression line.
Before doing slope inference, check the conditions from 9.2.B, which include linearity, constant standard deviation of y across x, independence (random sampling and the 10% condition), and approximately normal responses; residual plots verify the first two.
Always interpret the slope as the predicted average change in y for each one-unit increase in x, stated in context with units.
A strong-fitting LSRL describes association, not causation; only a randomized experiment lets you make causal claims.
It's the line ŷ = a + bx that best fits a scatterplot of two quantitative variables by minimizing the sum of squared residuals. In Unit 9, its slope b becomes the point estimate for the true population slope β.
Not exactly, and this is a classic trap. It minimizes the sum of the squared vertical distances (residuals), not the straight-line perpendicular distances and not the unsquared distances. Saying "squared vertical" is what earns the point.
The LSRL (ŷ = a + bx) comes from your sample and would change with a new sample. The population regression line (μy = α + βx) is the true, unknown relationship you're trying to estimate. Confidence intervals for slope bridge the gap between b and β.
No. The LSRL only describes an association between two variables. Even a slope of 4.5 with a tiny standard error doesn't establish causation unless the data came from a randomized experiment.
It anchors FRQ Question 1 in many years, including 2017 (wolves), 2018 (grocery checkout), 2022 (bullfrogs), and 2023 (tule elk). You're typically asked to interpret the slope in context, make predictions, compute residuals, or build a confidence interval for the slope from regression output.