๐Ÿ“š

All Subjects

ย >ย 

๐Ÿ“Šย 

AP Stats

ย >ย 

โœŒ๏ธ

Unit 2

2.8 Least Squares Regression

3 min readโ€ขjune 3, 2020

Peter Cao


The least squares regression line is the best linear regression line that exists. Itโ€™s made by minimizing the sum of the squares of the residuals. Why square the residuals? This is because if we didnโ€™t, negative and positive residuals would cancel out, reducing the impact of the residuals. Like regular regression models, the LSRL has a formula of ลท=a+bx, with a being y-intercept and b being slope with each having their own formula using one-variable statistics of x and y.

LSRLโ€”Slope

The slope is the predicted increase in the response variable with an increase of one unit of the explanatory variable. To find the slope, we have the formula:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.35-whWDxLSMvi8O.png?alt=media&token=485a9fc0-8e75-42a2-b8ca-0ee8b9caa420

image courtesy of: codecogs.com

This is basically saying that the slope is the average deviation of y over the average deviation of x with the correlating coefficient as a correcting factor.

Template for Interpretation

When asked to interpret a slope of a LSRL, follow the template below:

There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).

Big Three

  • Context

  • Correct definition

  • Word "predicted"

LSRLโ€”y-intercept

Once we have a slope, we can get the y-intercept and general formula of the LSRL from point-slope form given that we have a point. Fortunately, we have a point that we can use for this. An important thing to note is that the LSRL always passes through the point (xฬ„,ศณ). Thus, from point-slope form we have ลท-ศณ=b(x-xฬ„) and ลท=bx+(-bxฬ„+ศณ). The expression in parentheses is the y-intercept, but usually, you can derive it from the point-slope form.

We can interpret the y-intercept as the value the response variable would take if the explanatory variable is 0.

Template for Interpretation

When asked to interpret a y-intercept of a LSRL, follow the template below:

The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).

Big Three

  • Context
  • Correct definition
  • Word "predicted"

LSRLโ€”Coefficient of Determination

To determine how well the LSRL fits the data, we can use a statistic called the coefficient of determination, also called r^2 because it is the correlation coefficient squared. This can be a value between 0 and 1, with 0 meaning that the LSRL does not model the data at all, with the equation being ลท-ศณ, while 1 means that all the points lie on the LSRL.ย  There is also another formula for r^2 as well. This formula is:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.41-1jWZ9EdyNZ1D.png?alt=media&token=60306eac-3e4d-4c98-8d9d-3a9af5a4985e

image courtesy of: codecogs.com

This is saying that this is the percent difference between the variance of y and the sum of the residual squared. In other words, this is the reduction in the variation of y due to the LSRL. When interpreting this we say that it is the โ€œpercentage of the variation of y that can be explained by a linear model with respect to x.โ€

Template for Interpretation

When asked to interpret a coefficient of determination for a least squares regression model, use the template below:

____% of the variation in (y in context) is due to its linear relationship with (x in context).

Big Three

  • Context
  • Correct definition
  • Linking linear relationship

LSRLโ€”Standard Deviation of the Residuals

The last statistic we will talk about is the standard deviation of the residuals, also called s. S is the typical residual by a given data point of the data with respect to the LSRL. The formula for s is given as

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.46-senL6A1vzXWl.png?alt=media&token=a8d8719a-1b29-4cb2-b889-3377be04e77f

image courtesy of: apcentral.collegeboard.org

which looks similar to the sample standard deviation, except we will divide by n-2 and not n-1. Why? We will learn more about s when we learn inference for regression in Unit 9.

Reading a Computer Printout

On the AP test, it is very likely that you will be expected to read a computer printout of the data. Here is a sample printout with a look at where most of the statistics you will need to use are (the rest you will learn in Unit 9):

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.47-0GRdP9LpGgn7.png?alt=media&token=ac6e86fd-aaa8-4ae7-8ae1-0f0627d2c3a7

Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statisticsโ€”For the AP Exam, 5th Edition. Cengage Publishing.

Always use R-Sq, NEVER R-Sq(adj)!

๐ŸŽฅWatch: AP Stats - Least Squares Regression Lines

Resources:

Was this guide helpful?

Join us on Discord

Thousands of students are studying with us for the AP Statistics exam.

join now

Browse Study Guides By Unit

โœ๏ธ
Blogs

โœ๏ธ
Free Response Questions (FRQs)

๐Ÿง
Multiple Choice Questions (MCQs)

๐Ÿ‘†
Unit 1: Exploring One-Variable Data

๐Ÿ”Ž
Unit 3: Collecting Data

๐ŸŽฒ
Unit 4: Probability, Random Variables, and Probability Distributions

๐Ÿ“Š
Unit 5: Sampling Distributions

โš–๏ธ
Unit 6: Inference for Categorical Data: Proportions

๐Ÿ˜ผ
Unit 7: Inference for Qualitative Data: Means

โœณ๏ธ
Unit 8: Inference for Categorical Data: Chi-Square

๐Ÿ“ˆ
Unit 9: Inference for Quantitative Data: Slopes

Play this on HyperTyper

Practice your typing skills while reading Least Squares Regression

Start Game