📚

All Subjects

 > 

📊 

AP Stats

 > 

✌️

Unit 2

2.9 Analyzing Departures from Linearity

3 min readdecember 9, 2020

peter-cao

Peter Cao


AP Statistics 📊

Bookmarked 4.3k • 246 resources
See Units

Sometimes, the least squares regression model may not be the best for representing a data set. We’re going to list some reasons why.

Influential Points

An influential point is a point that when added, significantly changes the regression model, whether by affecting the slope, y-intercept, or correlation. There are two types: outliers and high-leverage points, which are both shown in this graph.
https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2012.25-5GNSkRvqzeSm.png?alt=media&token=4dcc3182-47bc-4c48-8ba7-e4c7f09d5414

Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.

Outliers

An outlier is a point in which the y-value is far away from the rest of the points, that is, it has a high-magnitude residual. These points heavily reduce the correlation of the scatterplot and can occasionally change the y-intercept of a regression line. Child 19 on the scatterplot above is an outlier.

High-Leverage Points

A high-leverage point is a point in which the x-value is far away from the rest of the points. These points pull the regression line towards this point, and thus can significantly change the slope of the line. It can occasionally change the y-intercept of a regression line. Child 18 on the scatterplot above is a high-leverage point.

Transforming Data and Nonlinear Regression

Sometimes, a linear model is not a good fit for a set of data, and thus it is better to use a nonlinear model. The types that we have to know for this class are exponential and power regression models. (There is also polynomial regression, but that requires knowledge of linear algebra, which is beyond the scope of this course.)
To use exponential and power regression, we will need to transform the data to linearize it (However, most calculators have options to automatically calculate this for you.). 

Exponential Models

Exponential models are of the form ŷ=ab^x. We transform this by taking the natural logarithm of both sides. With logarithm properties, we get ln(ŷ)=ln(a)+ln(b)x. This means that the relationship between ln(ŷ) and x is linear, so we find the LSRL of this transformed data with the y-intercept being a* and the slope being b*. To find a and b, we use:
https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2012.32-gBBeklRZv5Nk.png?alt=media&token=394fbf44-0521-4b69-aeb1-bd93cc1da72c

image courtesy of: codecogs.com

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2012.33-q2W96Dcymdvt.png?alt=media&token=142875e7-6581-400c-98ff-fb23818deb28

image courtesy of: codecogs.com

Power Models

Power models are of the form ŷ=ax^b. Like exponential models, we also take the natural logarithm of both sides, and with manipulation, we get ln(ŷ)=ln(a)+bln(x). This time, the relationship between ln(ŷ) and ln(x) is linear. With the LSRL of the transformer data again having y-intercept a* and slope b*, we have:
https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2012.34-iStX0IjKyF1S.png?alt=media&token=38dae083-44f4-4a1c-828b-61fa7e16b94b

image courtesy of: codecogs.com

and b=b*.

How Can I Tell?

To evaluate which transformation to use, we check both the residual plots of the transformed data and its R^2. We pick the right model by seeing whether the residuals are randomly scattered and not curved and also whether the R^2 is close to 1. By the way, the R^2 is interpreted as the percent of variation in the response variable that can be explained by a power/exponential model relative to the explanatory variable, which is very similar to its linear counterpart. If the conditions above aren’t met, then there may be another model that may work that we haven’t learned or there are influential points skewing the data set, which is more likely.

Summary

To summarize, if our data appears to be an exponential model, we need to take the natural log (or any other base log) of our y coordinates. If our data appears to be a power model, such as a quadratic or cubic function, we need to take the log of both our x and y coordinates.
🎥Watch: AP Stats - Exploring Two-Variable Data

Resources:

Was this guide helpful?

Join us on Discord
Thousands of students are studying with us for the AP Statistics exam.
join now
Hours Logo
Studying with Hours = the ultimate focus mode
Start a free study session
🔍 Are you ready for college apps?
Take this quiz and find out!
Start Quiz
Browse Study Guides By Unit
📆Big Reviews: Finals & Exam Prep
✏️Blogs
✍️Free Response Questions (FRQs)
👆Unit 1: Exploring One-Variable Data
✌️Unit 2: Exploring Two-Variable Data
🔎Unit 3: Collecting Data
🎲Unit 4: Probability, Random Variables, and Probability Distributions
📊Unit 5: Sampling Distributions
⚖️Unit 6: Inference for Categorical Data: Proportions
😼Unit 7: Inference for Qualitative Data: Means
✳️Unit 8: Inference for Categorical Data: Chi-Square
📈Unit 9: Inference for Quantitative Data: Slopes
FREE AP stats Survival Pack + Cram Chart PDF
Sign up now for instant access to 2 amazing downloads to help you get a 5
Join us on Discord
Thousands of students are studying with us for the AP Statistics exam.
join now
💪🏽 Are you ready for the Stats exam?
Take this quiz for a progress check on what you’ve learned this year and get a personalized study plan to grab that 5!
START QUIZ
Play this on HyperTyper
Practice your typing skills while reading Analyzing Departures from Linearity
Start Game