Regression Line

A regression line (least-squares regression line, or LSRL) is the line ŷ = a + bx that minimizes the sum of the squared residuals, used to predict a response variable y from an explanatory variable x. It always passes through (x̄, ȳ), and its slope is b = r(sy/sx).

Verified for the 2027 AP Statistics examLast updated June 2026

What is Regression Line?

A regression line is the straight line that best summarizes a linear relationship between two quantitative variables in a scatterplot. On the AP exam it's almost always the least-squares regression line (LSRL), written ŷ = a + bx. "Least squares" tells you exactly how the line is chosen. Out of every possible line, the LSRL is the one that minimizes the sum of the squared residuals (the vertical gaps between actual points and the line). Two facts the CED guarantees about this line: it always passes through the point (x̄, ȳ), and its slope can be calculated as b = r(sy/sx), which ties the line directly to the correlation coefficient.

The coefficients have specific meanings you'll be asked to interpret. The slope b is the predicted change in y for every one-unit increase in x. The y-intercept a = ȳ - bx̄ is the predicted y-value when x = 0, which sometimes makes no real-world sense (a wolf with length 0 meters has no meaningful predicted weight). The hat on ŷ matters. The line gives predicted values, not actual ones, and the leftover gap (y - ŷ) is the residual. The full mechanics live in the [Topic 2.8 study guide on Least Squares Regression](topic 2.8), and the line comes back in Unit 9 when you do inference on its slope.

Why Regression Line matters in AP Statistics

The regression line is the backbone of two whole chunks of the course. In Unit 2, learning objectives 2.8.A and 2.8.B have you estimating the LSRL's parameters and interpreting its slope and intercept in context, while 2.9.A and 2.9.B cover what happens when the line goes wrong (influential points) or when the data isn't linear and needs a transformation. Then Unit 9 reframes the same line as an estimate. Your sample line ŷ = a + bx is your best guess at the population regression line μy = α + βx, and Topics 9.2 and 9.3 (LOs 9.2.A through 9.2.D and 9.3.A through 9.3.C) build confidence intervals for the true slope β using b ± t*(SEb). If you can interpret a slope in context in Unit 2, you're already halfway to interpreting a confidence interval for that slope in Unit 9. The exam loves this thread, which is why a regression question shows up on the FRQ section almost every year.

How Regression Line connects across the course

Slope (Units 2 & 9)

The slope is the single most tested piece of the regression line. In Unit 2 you interpret it ("for each additional cm of length, predicted mass increases by b grams"). In Unit 9 that same b becomes the point estimate for the true population slope β in the interval b ± t*(SEb).

Residuals (Unit 2)

Residuals literally define the regression line, since the LSRL is the line that makes the sum of squared residuals as small as possible. Residual plots then act as the line's report card. A random scatter means the linear model fits; a curved pattern means it doesn't, and Topic 2.9 says to transform the data.

Coefficient of Determination (Unit 2)

r² measures how well the regression line actually performs. It's the proportion of variation in y explained by the linear relationship with x. After a transformation, r² moving closer to 1 is evidence the new line is the better prediction model.

Confidence Interval (Unit 9)

Unit 9 treats your regression line as one sample's estimate of a true population line. A t-interval for the slope, b ± t*(SEb), gives a range of plausible values for β, and per LO 9.3.B you can use whether that interval contains 0 to justify a claim about whether x and y are really linearly related.

Influential Point (Unit 2)

The regression line is fragile. One point with an extreme x-value (high leverage) or a huge residual (outlier) can drag the line toward itself, substantially changing the slope, intercept, or correlation when removed. That sensitivity is exactly what LO 2.9.A asks you to identify.

Is Regression Line on the AP Statistics exam?

Regression lines are FRQ regulars. The 2017 FRQ (wolf length vs. weight), 2018 FRQ (customers in line vs. checkout time), 2022 FRQ (bullfrog length vs. mass), and 2023 FRQ (tule elk) all hand you computer regression output or a scatterplot and ask you to interpret coefficients, use the line to predict, compute or interpret a residual, or evaluate the model. Multiple-choice questions hit the calculations. You'll compute b from summary statistics using b = r(sy/sx), find a from a = ȳ - bx̄, or pick the correct interpretation of a y-intercept (and recognize when it has no logical meaning, like predicted energy use at 0°C only making sense if 0 is within the data range). Three habits earn points every time. Say "predicted y" not "y" when interpreting the line, attach context and units, and only predict within the range of the x-data (extrapolation loses credit). In Unit 9, expect to construct and interpret a t-interval for the slope and check conditions like linearity and equal standard deviation of y across x using residual plots.

Regression Line vs Population regression line

The regression line you calculate, ŷ = a + bx, comes from a sample, so a and b are statistics that change from sample to sample. The population regression line, μy = α + βx, describes the true relationship for everyone, and α and β are fixed parameters you never actually see. This distinction is the entire point of Unit 9. You build a confidence interval around your sample slope b precisely because it's just an estimate of the unknown true slope β. On FRQs, mixing these up (saying "the slope is 1.8" when you mean "the estimated slope is 1.8") can cost interpretation points.

Key things to remember about Regression Line

  • The least-squares regression line ŷ = a + bx is the line that minimizes the sum of the squared residuals, and it always passes through the point (x̄, ȳ).

  • The slope is b = r(sy/sx) and means the predicted y changes by b for every one-unit increase in x; always interpret it with "predicted" and in context.

  • The y-intercept a = ȳ - bx̄ is the predicted y when x = 0, but sometimes that interpretation makes no logical sense in context.

  • r², the coefficient of determination, tells you the proportion of variation in y that the regression line explains.

  • Outliers, high-leverage points, and influential points can dramatically change the line's slope and intercept, and a curved residual plot means you should transform the data instead of trusting the line.

  • In Unit 9, your sample slope b estimates the true population slope β, and the confidence interval b ± t*(SEb) lets you justify claims about that true slope.

Frequently asked questions about Regression Line

What is a regression line in AP Stats?

It's the line ŷ = a + bx that best fits a scatterplot of two quantitative variables, chosen by minimizing the sum of the squared residuals (that's why it's called the least-squares regression line). You use it to predict a response variable y from an explanatory variable x.

Does the regression line prove that x causes y?

No. A regression line only describes an association in the data; it cannot establish causation unless the data came from a randomized experiment. A strong slope and high r² from observational data could be explained by confounding variables, and saying "causes" on an FRQ when the data is observational will cost you.

What's the difference between the regression line and the correlation coefficient?

The correlation coefficient r is a single number from -1 to 1 measuring the strength and direction of a linear relationship, while the regression line is an actual equation you can use to make predictions. They're linked through the slope formula b = r(sy/sx), so r and b always share the same sign.

Does the regression line have to pass through any specific point?

Yes, the least-squares regression line always passes through (x̄, ȳ), the means of both variables. That fact, combined with the slope formula b = r(sy/sx), is how you can find the intercept using a = ȳ - bx̄, a calculation that shows up regularly in multiple choice.

How is the regression line used in Unit 9 inference?

Your sample regression line ŷ = a + bx estimates the true population line μy = α + βx. In Topics 9.2 and 9.3 you build a confidence interval for the true slope, b ± t*(SEb) with n - 2 degrees of freedom, and use it to justify claims, like checking whether 0 is a plausible value for β.