Fiveable

📊AP Statistics Unit 2 Review

QR code for AP Statistics practice questions

2.8 Least Squares Regression

2.8 Least Squares Regression

Written by the Fiveable Content Team • Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examWritten by the Fiveable Content Team • Last updated June 2026
📊AP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

The least squares regression line (LSRL) is the line that minimizes the sum of squared residuals, written as y^=a+bx\hat{y}=a+bx. You will calculate and interpret its slope and y intercept, find the line using b=rsysxb=r\frac{s y}{s x} and a=yˉbxˉa=\bar{y} b\bar{x}, and use r2r^2 to describe how much of the variation in the response variable the model explains.

LSRL in AP Statistics

In AP Statistics, the LSRL is the linear model used to predict a response variable from an explanatory variable. The line is chosen because it makes the sum of squared residuals as small as possible, and it always contains the point (xˉ,yˉ)(\bar{x}, \bar{y}).

The exam usually asks you to do two things with the LSRL: calculate values correctly and interpret them in context. That means formulas matter, but your written explanation matters just as much.

Why This Matters for the AP Statistics Exam

Two-variable data shows up across the exam, and the LSRL is the core tool for modeling a linear relationship between two quantitative variables. You need to find slope and intercept from summary statistics or technology output, read a computer printout, and explain in plain language what each number means in context. Interpretation matters as much as calculation here. On free-response questions, vague answers lose credit, so connecting slope, intercept, and r² to the actual variables is what separates a clear response from a weak one.

Key Takeaways

  • The LSRL minimizes the sum of squared residuals and always passes through (xˉ,yˉ)(\bar{x}, \bar{y}).
  • Slope formula: b=rsysxb=r\frac{s_y}{s_x}. Intercept formula: a=yˉbxˉa=\bar{y}-b\bar{x}.
  • Slope = predicted change in y for each one-unit increase in x. Always use the word "predicted."
  • The y-intercept is the predicted y when x = 0, but it sometimes has no logical meaning in context.
  • r² (coefficient of determination) is the proportion of variation in y explained by its linear relationship with x.
  • When reading a computer printout, use R-Sq, never R-Sq(adj).

Building the Line

The LSRL has the form y^=a+bx\hat{y}=a+bx, where y^\hat{y} is the predicted response, xx is the explanatory variable, aa is the y-intercept, and bb is the slope. The "least squares" name comes from how it is chosen: of all possible lines, this one makes the sum of the squared residuals as small as possible.

Residuals are squared (not just added up) for two reasons. First, squaring stops positive and negative residuals from cancelling out. Second, it gives larger misses more weight, so the line responds more to points that are far off.

A useful fact: the LSRL always passes through the point (xˉ,yˉ)(\bar{x}, \bar{y}), the means of xx and yy. That single fact lets you recover the intercept once you know the slope.

Slope

The slope is the predicted change in the response variable for a one-unit increase in the explanatory variable. The formula is:

b=rsysxb = r\frac{s_y}{s_x}

where r is the correlation between x and y, s_y is the standard deviation of y, and s_x is the standard deviation of x. The slope blends the strength of the linear relationship (r) with the spread of each variable.

Interpreting the Slope

Use a template like this:

"There is a predicted increase/decrease of ______ (slope in units of y) for every 1 (unit of x)."

A strong slope interpretation includes:

  • Context (the actual variables, with units)
  • The correct value and direction
  • The word "predicted"

Y-Intercept

Because the LSRL passes through (x̄, ȳ), you can find the intercept directly:

a=yˉbxˉa = \bar{y} - b\bar{x}

The y-intercept is the predicted value of y when x = 0. Sometimes that is a real, meaningful value. Other times x = 0 is impossible or far outside the data, so the intercept has no logical interpretation in context. Mention that when it applies.

Interpreting the Y-Intercept

"The predicted value of (y in context) is _____ when (x in context) is 0 (units in context)."

Same three pieces:

  • Context
  • Correct value
  • The word "predicted"

Coefficient of Determination (r²)

In simple linear regression, r² is the square of the correlation r. It is called the coefficient of determination, and it measures the proportion of variation in the response variable that is explained by the explanatory variable in the model.

Values range from 0 to 1. A value near 0 means the line explains almost none of the variation in y. A value near 1 means the line explains almost all of it.

Interpreting r²

"____% of the variation in (y in context) is explained by its linear relationship with (x in context)."

Strong answers include context, the correct definition, and a clear link to the linear relationship between the two variables.

Standard Deviation of the Residuals

Another statistic from regression output is s, the standard deviation of the residuals. It describes the typical size of a residual, meaning roughly how far the actual y-values fall from the predicted values. Computer printouts report this value, and you will work with it more when you reach inference for regression later in the course.

Reading a Computer Printout

The AP exam often gives regression results as a computer printout instead of raw data. You should be able to pull the slope, intercept, r, r², and s from one. The slope and intercept usually appear in a coefficients column, with the explanatory variable's row giving the slope and the "Constant" row giving the intercept.

One reminder when reading output: use R-Sq, never R-Sq(adj).

How to Use This on the AP Statistics Exam

Free Response

  • Show the formulas you use. Writing b = r(s_y/s_x) and a = ȳ - bx̄ makes your work clear and easy to follow.
  • Always interpret in context. A number with no variables attached usually does not support a stronger score.
  • Use "predicted" in slope and intercept interpretations. The line predicts; it does not guarantee.
  • For r², state the percent and tie it to the variation in y explained by x.

Problem Solving

  • If you are given r, s_x, s_y, x̄, and ȳ, you can build the whole line without raw data. Find b first, then a.
  • To predict a value, plug x into ŷ = a + bx. Watch for extrapolation if x is far outside the data range.
  • Remember the line passes through (x̄, ȳ), which is a quick check on your work.

Common Trap

  • Saying "for every increase in x, y increases by the slope" without "predicted" or without units weakens the answer.
  • Do not interpret the intercept literally when x = 0 makes no sense for the data. Note that it has no practical meaning instead.

Common Misconceptions

  • r² is not r. r² is the square of the correlation, and it describes the proportion of variation explained, not the strength or direction of the relationship.
  • A high r² does not prove causation. It only tells you how much of the variation in y the linear model accounts for.
  • The slope is not the actual change in y; it is the predicted change. Real data points scatter around the line.
  • The y-intercept does not always have a real-world meaning. When x = 0 is impossible or outside the data, treat the intercept as a math feature of the model.
  • Minimizing squared residuals is not the same as minimizing the residuals themselves. Squaring changes which line wins and prevents positive and negative residuals from cancelling.
  • When reading output, R-Sq(adj) is a different value. Use R-Sq for this course.

Practice Problem

A researcher is studying the relationship between the amount of sleep (in hours) and the performance on a cognitive test. She collects data from 50 participants and fits a linear regression model to the data. The summary of the model is shown below:

Summary of Linear Regression Model:

Response variable: Performance on cognitive test (y)

Explanatory variable: Amount of sleep (x)

Slope (b): -2.5

Y-intercept (a): 50

Correlation coefficient (r): -0.7

R-squared: 0.49

a) Interpret the slope of the model in the context of the problem.

b) Interpret the y-intercept of the model in the context of the problem.

c) Interpret the correlation coefficient of the model in the context of the problem.

d) Interpret the R-squared value of the model in the context of the problem.

e) Based on the summary of the model, do you think that the amount of sleep has a significant effect on the performance on the cognitive test? Why or why not?

f) Suppose the researcher collects data from an additional 50 participants and fits a new linear regression model to the combined data. The summary of the new model is shown below:

Slope (b): -1.9

Y-intercept (a): 48

Correlation coefficient (r): -0.6

R-squared: 0.36

Compare the two models and explain how the new model differs from the original model in terms of the strength and direction of the relationship between the amount of sleep and the performance on the cognitive test.

Answers

a) The slope of the model is -2.5, which means that for every one-hour increase in the amount of sleep, the performance on the cognitive test is predicted to decrease by 2.5 points.

b) The y-intercept of the model is 50, which means that the performance on the cognitive test is predicted to be 50 points when the amount of sleep is zero. (Note: zero hours of sleep is at the edge of what is realistic, so interpret this value with care.)

c) The correlation coefficient of the model is -0.7, which indicates a strong negative linear relationship between the amount of sleep and the performance on the cognitive test. A negative correlation means that as the amount of sleep increases, the performance on the cognitive test tends to decrease.

d) The R-squared value of the model is 0.49, which means that 49% of the variation in the performance on the cognitive test is explained by its linear relationship with the amount of sleep. The other 51% comes from factors not captured by this model.

e) You cannot determine statistical significance from the slope, correlation, or r2r^2 alone. To claim a significant effect, you would need inference evidence, such as a regression t-test p-value or a confidence interval for the slope. Descriptively, the relationship appears moderately strong and negative, but significance is not established from this summary.

f) In the new model, the slope is -1.9, which is less steep than the slope in the original model (-2.5). The correlation is weaker in the new model (-0.6 versus -0.7), and the R-squared is lower (0.36 versus 0.49). Both models show a negative relationship, but the new model has a weaker linear relationship between amount of sleep and cognitive test performance.

zing the sum of squared residuals.

What is the LSRL formula?

The least squares regression line is written as y^=a+bx\hat{y}=a+bx. In that equation, aa is the y-intercept, bb is the slope, xx is the explanatory variable, and y^\hat{y} is the predicted response.

How do you find the LSRL from summary statistics?

First calculate the slope with b=rsysxb=r\frac{s_y}{s_x}. Then calculate the intercept with a=yˉbxˉa=\bar{y}-b\bar{x}. Once you have both values, write the line as y^=a+bx\hat{y}=a+bx.

How do you interpret the slope of an LSRL?

The slope is the predicted change in the response variable for each one-unit increase in the explanatory variable. A strong AP Stats answer uses context, units, direction, and the word "predicted."

What does r² mean in least squares regression?

r2r^2 is the proportion of variation in the response variable explained by its linear relationship with the explanatory variable. For example, r2=0.64r^2=0.64 means 64% of the variation in the response variable is explained by the model.

When does the y-intercept have no meaning?

The y-intercept has no practical meaning when x=0x=0 is impossible, unreasonable, or far outside the data range. In that case, say it is the predicted value when x=0x=0 but does not have a logical interpretation in context.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

coefficient of determination

The value r², which represents the proportion of variation in the response variable that is explained by the explanatory variable in the regression model.

coefficients

The numerical values in a regression equation that represent the slope and y-intercept of the least-squares regression line.

correlation

A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.

explanatory variable

A variable whose values are used to explain or predict corresponding values for the response variable.

least-squares regression line

A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points.

parameter

A numerical summary that describes a characteristic of an entire population.

predicted value

The estimated response value obtained from a regression model, denoted as ŷ.

residual

The difference between the actual observed value and the predicted value in a regression model, calculated as residual = y - ŷ.

response variable

A variable whose values are being explained or predicted based on the explanatory variable.

sample standard deviation

The standard deviation calculated for a sample, denoted by s, using the formula s = √(1/(n-1) ∑(xᵢ-x̄)²).

simple linear regression

A regression model that describes the linear relationship between one explanatory variable and one response variable.

slope

The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable.

y-intercept

The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero.

Frequently Asked Questions

What is an LSRL in statistics?

An LSRL is a least squares regression line. It is the line that predicts a response variable from an explanatory variable while minimizing the sum of squared residuals.

What is the LSRL formula?

The least squares regression line is written as y-hat = a + bx. In that equation, a is the y-intercept, b is the slope, x is the explanatory variable, and y-hat is the predicted response.

How do you find the LSRL from summary statistics?

First calculate the slope with b = r(s_y/s_x). Then calculate the intercept with a = y-bar - b x-bar. Once you have both values, write the line as y-hat = a + bx.

How do you interpret the slope of an LSRL?

The slope is the predicted change in the response variable for each one-unit increase in the explanatory variable. A strong AP Stats answer uses context, units, direction, and the word predicted.

What does r² mean in least squares regression?

r² is the proportion of variation in the response variable explained by its linear relationship with the explanatory variable. For example, r² = 0.64 means 64% of the variation in the response variable is explained by the model.

When does the y-intercept have no meaning?

The y-intercept has no practical meaning when x = 0 is impossible, unreasonable, or far outside the data range. In that case, say it is the predicted value when x = 0 but does not have a logical interpretation in context.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly→ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot