Population Regression Line in AP Statistics

The population regression line is the true, unknown linear relationship between two quantitative variables across an entire population, written μy = α + βx, where α and β are population parameters that your sample's least-squares regression line (ŷ = a + bx) tries to estimate.

Verified for the 2027 AP Statistics examLast updated June 2026

What is the Population Regression Line?

The population regression line is the "real" line connecting two quantitative variables for an entire population. You write it as μy = α + βx, where μy is the mean value of the response variable for a given x, β is the true slope, and α is the true y-intercept. Here's the catch. You almost never get to see this line. Measuring every individual in a population (every tule elk in California, every student in the country) isn't realistic, so α and β stay unknown.

What you can see is a sample. When you fit a least-squares regression line (LSRL) to sample data, you get ŷ = a + bx, where a estimates α and b estimates β. That's the whole setup for regression inference in AP Stats. The sample slope b is a statistic, the population slope β is a parameter, and Unit 9 is all about using b (plus its standard error) to build confidence intervals and run significance tests about β. Think of the population regression line as the target and your sample LSRL as one throw at it. Different samples give different lines, but they all aim at the same true line.

Why the Population Regression Line matters in AP® Statistics

This term lives in Unit 9: Inference for Quantitative Data — Slopes, and it's the bridge between two halves of the course. In Unit 2 you learned to describe a linear relationship with a sample LSRL. In Unit 9 you stop describing your sample and start making claims about the population, and the population regression line is the thing you're making claims about. Every t-test for slope asks, in effect, "is the true slope β of the population regression line actually zero, or is the pattern in my sample real?" Every confidence interval for slope is a range of plausible values for β. The line also anchors the conditions for inference, since the linearity condition asks whether a population regression line is even a reasonable model in the first place. If you can't tell the population line (parameters α and β) apart from the sample line (statistics a and b), Unit 9 inference questions will feel like word salad.

How the Population Regression Line connects across the course

Least Squares Method (Units 2 & 9)

The least squares method produces the sample regression line, not the population one. It minimizes the sum of squared residuals in your data, giving you a and b, which serve as your best estimates of the population line's α and β.

Standard Error of the Slope (Unit 9)

Because every sample gives a slightly different sample slope b, the standard error of the slope measures how much b typically varies from sample to sample. It's the key ingredient in confidence intervals and t-tests about the population slope β.

Population Parameter (Units 1, 5-9)

The slope β and intercept α of the population regression line are parameters, fixed but unknown numbers describing the population. This is the same parameter-vs-statistic distinction you've used since μ vs x̄ and p vs p̂, just applied to lines.

Residuals (Unit 2)

Residuals measure how far data points fall from the sample line, and a residual plot helps you check whether a linear model fits at all. If the residual plot shows a curve, a population regression line probably isn't the right model, and slope inference isn't appropriate.

Is the Population Regression Line on the AP® Statistics exam?

On the AP Stats exam, this concept shows up wherever slope inference does. The 2023 FRQ Q5 (the tule elk problem) is the classic format. You're given regression output from a sample, and you have to define the parameter correctly, which means saying β is the slope of the population regression line relating the two variables, not the slope of your sample. Botching that parameter definition costs points fast. MCQs love to test whether you know which symbols are parameters (α, β) versus statistics (a, b), and whether a confidence interval for slope is about β or about b. When you write hypotheses, H₀: β = 0 always refers to the population slope. Writing H₀: b = 0 is a guaranteed deduction, because there's nothing uncertain about your own sample slope, you calculated it.

The Population Regression Line vs Sample (Least-Squares) Regression Line

The sample LSRL (ŷ = a + bx) is computed from your data using the least squares method, and its slope and intercept are statistics you can actually calculate. The population regression line (μy = α + βx) is the true line for the whole population, and its slope and intercept are parameters you'll never directly observe. The sample line estimates the population line. Inference exists precisely because they're not the same thing.

Key things to remember about the Population Regression Line

  • The population regression line, μy = α + βx, describes the true linear relationship between two quantitative variables in an entire population.

  • Its slope β and intercept α are population parameters, meaning they are fixed but unknown, while the sample line's a and b are statistics that estimate them.

  • The least squares method is applied to sample data to produce the sample LSRL, which is your best estimate of the population regression line.

  • Hypotheses for slope inference are always written about β, the population slope, so H₀: β = 0 claims there is no true linear relationship in the population.

  • Different random samples produce different sample regression lines, and the standard error of the slope quantifies that sample-to-sample variability in b.

  • A confidence interval for slope gives a range of plausible values for β, the slope of the population regression line, not for your sample slope b.

Frequently asked questions about the Population Regression Line

What is the population regression line in AP Stats?

It's the true linear relationship between two quantitative variables for an entire population, written μy = α + βx. You can't observe it directly, so you estimate it with a least-squares regression line fit to sample data, and Unit 9 inference lets you make conclusions about its slope β.

Is the population regression line the same as the least-squares regression line?

No. The least-squares regression line (ŷ = a + bx) comes from your sample and is something you calculate. The population regression line (μy = α + βx) is the unknown true line the sample line is estimating. Confusing them in an FRQ parameter definition costs points.

What's the difference between β and b in regression?

β is the slope of the population regression line, a fixed but unknown parameter. b is the slope of your sample's LSRL, a statistic that varies from sample to sample. You use b, along with the standard error of the slope, to test claims and build confidence intervals about β.

Why do hypotheses use β instead of b?

Because hypotheses are claims about the population, not your sample. There's no uncertainty about b, since you computed it from your data. The question a t-test for slope answers is whether the population slope β could plausibly be zero, so H₀: β = 0 is the correct form.

Can you ever actually calculate the population regression line?

Only if you measured every individual in the population, which almost never happens in practice. That's why it stays a theoretical model with unknown parameters α and β, and why the 2023 FRQ on tule elk asked about inference for the slope rather than the slope itself.