Least-Squares Regression Line in AP Statistics

The least-squares regression line (LSRL) is the line ŷ = a + bx that best fits bivariate quantitative data by minimizing the sum of squared residuals (vertical distances between observed and predicted y-values), and its slope b serves as the point estimate for the population slope β in AP Stats Unit 5 inference.

Verified for the 2027 AP Statistics exam•Last updated June 2026

What is the Least-Squares Regression Line?

The least-squares regression line (LSRL) is the straight line that fits a scatterplot of two quantitative variables better than any other line, where "better" has a specific meaning. It minimizes the sum of the squared residuals, the vertical gaps between each observed y-value and the y-value the line predicts. You write it as ŷ = a + bx, where a is the y-intercept, b is the slope, and the hat on ŷ reminds you it's a predicted value, not an observed one.

Here's the upgrade that happens in Unit 5. The LSRL you compute from a sample is an estimate of something bigger. Per the CED, the sample regression line ŷ = a + bx estimates the population regression line μy = α + βx. Your slope b is just one sample's guess at the true slope β, and a different sample would give a slightly different b. That sampling variability is exactly why Unit 5 builds confidence intervals and tests around the slope. Each residual, yi − ŷi, is itself an estimate of how far that observation sits from the population line.

Why the Least-Squares Regression Line matters in AP® Statistics

The LSRL is the backbone of the relevant current topics, interpretation of the slope of a Regression Model, and it shows up in every learning objective there. AP Stats the relevant learning objective asks you to recognize the sample line as an estimate of the population line. AP Stats the relevant learning objective has you verify conditions (linearity, equal standard deviation across x, independence, approximate normality of responses), and residual analysis from the LSRL is how you check the first two. AP Stats the relevant learning objective and the relevant learning objective have you build the interval b ± t*(SEb), where SE = s/(sx√(n−1)) and s estimates σ, the standard deviation of deviations from the population line. In other words, you can't do regression analysis without first understanding what the LSRL is and where its slope comes from. It's also the bridge concept connecting descriptive regression earlier in the course to formal inference at the end.

How the Least-Squares Regression Line connects across the course

Slope (Unit 5)

The slope b of the LSRL is the point estimate for the population slope β. Every confidence interval in the relevant current topics is built as b ± t*(SEb), so the entire inference procedure starts with the line's slope.

Residuals (Unit 5)

Residuals are the leftovers the LSRL couldn't explain, and the LSRL is literally defined by making their squared sum as small as possible. You also reuse residuals as a diagnostic tool. A residual plot with no curve and roughly even spread verifies the linearity and constant standard deviation conditions in the relevant learning objective.

Confidence Interval (Unit 5)

A confidence interval for slope answers the question the LSRL alone can't. Your sample gave you b, but how far might the true β be from it? The t-interval with n − 2 degrees of freedom turns one sample's line into a range of plausible population slopes.

Coefficient of Determination (R²) (Unit 5)

R² tells you how much of the variation in y the LSRL actually explains. A line always exists, even for terrible data, so R² is your reality check on whether the fit is worth trusting before you bother with inference.

Is the Least-Squares Regression Line on the AP® Statistics exam?

Question 1 on the AP Stats free response frequently hands you computer regression output for an LSRL and asks you to work with it. The 2017 FRQ used wolf length and weight, 2018 used checkout times in a grocery line, 2022 used bullfrog length and mass, and the 2023 exam used tule elk weights. Typical tasks include interpreting the slope in context, predicting a y-value, finding or interpreting a residual, and commenting on fit. Multiple-choice questions often give you an equation like ŷ = 62 + 4.5x with s = the relevant current topic and n = 25, then ask you to construct or interpret a interpretation of the slope, interpret the slope itself, or work with the standard error. The big point losses come from sloppy language. Always say "predicted" y, interpret the slope as an average change in y per one-unit increase in x, and keep context (units and variable names) in every interpretation.

The Least-Squares Regression Line vs Population regression line

The LSRL (ŷ = a + bx) is computed from your sample and changes from sample to sample. The population regression line (μy = α + βx) describes the true relationship for everyone, and you never actually see it. The whole point of the relevant current topics is using the sample line's slope b, plus its standard error, to estimate the unseen β. Mixing up b and β (or ŷ and μy) in an interpretation costs credit on FRQs.

Key things to remember about the Least-Squares Regression Line

The LSRL is the line ŷ = a + bx that minimizes the sum of squared vertical residuals, not perpendicular distances and not raw distances.
The sample slope b is the point estimate for the population slope β, and the confidence interval is b ± t*(SEb) with n − 2 degrees of freedom.
The standard error of the slope is SE = s/(sx√(n−1)), where s estimates the standard deviation of deviations from the population regression line.
Before doing regression analysis, check the conditions from the relevant learning objective, which include linearity, constant standard deviation of y across x, independence (random sampling and the 10% condition), and approximately normal responses; residual plots verify the first two.
Always interpret the slope as the predicted average change in y for each one-unit increase in x, stated in context with units.
A strong-fitting LSRL describes association, not causation; only a randomized experiment lets you make causal claims.

Frequently asked questions about the Least-Squares Regression Line

What is the least-squares regression line in AP Stats?

It's the line ŷ = a + bx that best fits a scatterplot of two quantitative variables by minimizing the sum of squared residuals. In Unit 5, its slope b becomes the point estimate for the true population slope β.

Does the least-squares regression line minimize the distance from each point to the line?

Not exactly, and this is a classic trap. It minimizes the sum of the squared vertical distances (residuals), not the straight-line perpendicular distances and not the unsquared distances. Saying "squared vertical" is what earns the point.

What's the difference between the LSRL and the population regression line?

The LSRL (ŷ = a + bx) comes from your sample and would change with a new sample. The population regression line (μy = α + βx) is the true, unknown relationship you're trying to estimate. Confidence intervals for slope bridge the gap between b and β.

Does a least-squares regression line prove that x causes y?

No. The LSRL only describes an association between two variables. Even a slope of 4.5 with a tiny standard error doesn't establish causation unless the data came from a randomized experiment.

How is the LSRL tested on the AP Stats exam?

It anchors FRQ Question 1 in many years, including 2017 (wolves), 2018 (grocery checkout), 2022 (bullfrogs), and 2023 (tule elk). You're typically asked to interpret the slope in context, make predictions, compute residuals, or build a interpretation of the slope from regression output.

Keep studying AP Statistics

Connect this key term to the AP exam workflow: review the course, practice questions, and check related study tools.

AP Statistics hub

Review units, study guides, and course resources.

AP-style practice

Check this vocabulary in multiple-choice context.

FRQ practice

Apply key concepts in written AP responses.

score calculator

Estimate the exam score you are working toward.

cheatsheets

Review the highest-yield facts before practice.

practice exam

Put the full course together before test day.