ap stats study guides

⚖️  Unit 6 - Inference for Categorical Data: Proportions

😼  Unit 7 - Inference for Qualitative Data: Means

✳️  Unit 8 Inference for Categorical Data: Chi-Square

📈  Unit 9 - Inference for Quantitative Data: Slopes

🧐  Multiple Choice Questions (MCQs)

2.8 Least Squares Regression

#exploringdata

#anticipatingpatterns

#leastsquaresregression

⏱️  3 min read

written by

Peter Cao

peter cao


The least squares regression line is the best linear regression line that exists. It’s made by minimizing the sum of the squares of the residuals. Why square the residuals? This is because if we didn’t, negative and positive residuals would cancel out, reducing the impact of the residuals. Like regular regression models, the LSRL has a formula of ŷ=a+bx, with a being y-intercept and b being slope with each having their own formula using one-variable statistics of x and y.

LSRL—Slope

The slope is the predicted increase in the response variable with an increase of one unit of the explanatory variable. To find the slope, we have the formula:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.35-whWDxLSMvi8O.png?alt=media&token=485a9fc0-8e75-42a2-b8ca-0ee8b9caa420

image courtesy of: codecogs.com

This is basically saying that the slope is the average deviation of y over the average deviation of x with the correlating coefficient as a correcting factor.

Template for Interpretation

When asked to interpret a slope of a LSRL, follow the template below:

There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).

Big Three

  • Context

  • Correct definition

  • Word "predicted"

LSRL—y-intercept

Once we have a slope, we can get the y-intercept and general formula of the LSRL from point-slope form given that we have a point. Fortunately, we have a point that we can use for this. An important thing to note is that the LSRL always passes through the point (x̄,ȳ). Thus, from point-slope form we have ŷ-ȳ=b(x-x̄) and ŷ=bx+(-bx̄+ȳ). The expression in parentheses is the y-intercept, but usually, you can derive it from the point-slope form.

We can interpret the y-intercept as the value the response variable would take if the explanatory variable is 0.

Template for Interpretation

When asked to interpret a y-intercept of a LSRL, follow the template below:

The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).

Big Three

  • Context
  • Correct definition
  • Word "predicted"

LSRL—Coefficient of Determination

To determine how well the LSRL fits the data, we can use a statistic called the coefficient of determination, also called r^2 because it is the correlation coefficient squared. This can be a value between 0 and 1, with 0 meaning that the LSRL does not model the data at all, with the equation being ŷ-ȳ, while 1 means that all the points lie on the LSRL.  There is also another formula for r^2 as well. This formula is:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.41-1jWZ9EdyNZ1D.png?alt=media&token=60306eac-3e4d-4c98-8d9d-3a9af5a4985e

image courtesy of: codecogs.com

This is saying that this is the percent difference between the variance of y and the sum of the residual squared. In other words, this is the reduction in the variation of y due to the LSRL. When interpreting this we say that it is the “percentage of the variation of y that can be explained by a linear model with respect to x.”

Template for Interpretation

When asked to interpret a coefficient of determination for a least squares regression model, use the template below:

____% of the variation in (y in context) is due to its linear relationship with (x in context).

Big Three

  • Context
  • Correct definition
  • Linking linear relationship

LSRL—Standard Deviation of the Residuals

The last statistic we will talk about is the standard deviation of the residuals, also called s. S is the typical residual by a given data point of the data with respect to the LSRL. The formula for s is given as

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.46-senL6A1vzXWl.png?alt=media&token=a8d8719a-1b29-4cb2-b889-3377be04e77f

image courtesy of: apcentral.collegeboard.org

which looks similar to the sample standard deviation, except we will divide by n-2 and not n-1. Why? We will learn more about s when we learn inference for regression in Unit 9.

Reading a Computer Printout

On the AP test, it is very likely that you will be expected to read a computer printout of the data. Here is a sample printout with a look at where most of the statistics you will need to use are (the rest you will learn in Unit 9):

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.47-0GRdP9LpGgn7.png?alt=media&token=ac6e86fd-aaa8-4ae7-8ae1-0f0627d2c3a7

Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.

Always use R-Sq, NEVER R-Sq(adj)!

🎥Watch: AP Stats - Least Squares Regression Lines

continue learning

Slide 1 of 11
Fiveable

Join Our Community

Fiveable Community students are already meeting new friends, starting study groups, and sharing tons of opportunities for other high schoolers. Soon the Fiveable Community will be on a totally new platform where you can share, save, and organize your learning links and lead study groups among other students!🎉

Fiveable Logo

2550 north lake drive
suite 2
milwaukee, wi 53211

92% of Fiveable students earned a 3 or higher on their 2020 AP Exams.

*ap® and advanced placement® are registered trademarks of the college board, which was not involved in the production of, and does not endorse, this product.

© fiveable 2020 | all rights reserved.