Subjects and resources that you bookmark will appear here.
Peter Cao
The least squares regression line is the best linear regression line that exists. It’s made by minimizing the sum of the squares of the residuals. Why square the residuals? This is because if we didn’t, negative and positive residuals would cancel out, reducing the impact of the residuals. Like regular regression models, the LSRL has a formula of ŷ=a+bx, with a being y-intercept and b being slope with each having their own formula using one-variable statistics of x and y.
The slope is the predicted increase in the response variable with an increase of one unit of the explanatory variable. To find the slope, we have the formula:
image courtesy of: codecogs.com
This is basically saying that the slope is the average deviation of y over the average deviation of x with the correlating coefficient as a correcting factor.
When asked to interpret a slope of a LSRL, follow the template below:
There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).
Big Three
Context
Once we have a slope, we can get the y-intercept and general formula of the LSRL from point-slope form given that we have a point. Fortunately, we have a point that we can use for this. An important thing to note is that the LSRL always passes through the point (x̄,ȳ). Thus, from point-slope form we have ŷ-ȳ=b(x-x̄) and ŷ=bx+(-bx̄+ȳ). The expression in parentheses is the y-intercept, but usually, you can derive it from the point-slope form.
We can interpret the y-intercept as the value the response variable would take if the explanatory variable is 0.
When asked to interpret a y-intercept of a LSRL, follow the template below:
The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).
Big Three
To determine how well the LSRL fits the data, we can use a statistic called the coefficient of determination, also called r^2 because it is the correlation coefficient squared. This can be a value between 0 and 1, with 0 meaning that the LSRL does not model the data at all, with the equation being ŷ-ȳ, while 1 means that all the points lie on the LSRL. There is also another formula for r^2 as well. This formula is:
image courtesy of: codecogs.com
This is saying that this is the percent difference between the variance of y and the sum of the residual squared. In other words, this is the reduction in the variation of y due to the LSRL. When interpreting this we say that it is the “percentage of the variation of y that can be explained by a linear model with respect to x.”
When asked to interpret a coefficient of determination for a least squares regression model, use the template below:
____% of the variation in (y in context) is due to its linear relationship with (x in context).
Big Three
The last statistic we will talk about is the standard deviation of the residuals, also called s. S is the typical residual by a given data point of the data with respect to the LSRL. The formula for s is given as
image courtesy of: apcentral.collegeboard.org
which looks similar to the sample standard deviation, except we will divide by n-2 and not n-1. Why? We will learn more about s when we learn inference for regression in Unit 9.
On the AP test, it is very likely that you will be expected to read a computer printout of the data. Here is a sample printout with a look at where most of the statistics you will need to use are (the rest you will learn in Unit 9):
Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.
Always use R-Sq, NEVER R-Sq(adj)!
🎥Watch: AP Stats - Least Squares Regression Lines
Browse Study Guides By Unit