✍️ Free Response Questions (FRQs)
👆 Unit 1 - Exploring One-Variable Data
1.4Representing a Categorical Variable with Graphs
1.5Representing a Quantitative Variable with Graphs
1.6Describing the Distribution of a Quantitative Variable
1.7Summary Statistics for a Quantitative Variable
1.8Graphical Representations of Summary Statistics
1.9Comparing Distributions of a Quantitative Variable
✌️ Unit 2 - Exploring Two-Variable Data
2.0 Unit 2 Overview: Exploring Two-Variable Data
2.1Introducing Statistics: Are Variables Related?
2.2Representing Two Categorical Variables
2.3Statistics for Two Categorical Variables
2.4Representing the Relationship Between Two Quantitative Variables
2.8Least Squares Regression
🔎 Unit 3 - Collecting Data
3.5Introduction to Experimental Design
🎲 Unit 4 - Probability, Random Variables, and Probability Distributions
4.1Introducing Statistics: Random and Non-Random Patterns?
4.7Introduction to Random Variables and Probability Distributions
4.8Mean and Standard Deviation of Random Variables
4.9Combining Random Variables
4.11Parameters for a Binomial Distribution
📊 Unit 5 - Sampling Distributions
5.0Unit 5 Overview: Sampling Distributions
5.1Introducing Statistics: Why Is My Sample Not Like Yours?
5.4Biased and Unbiased Point Estimates
5.6Sampling Distributions for Differences in Sample Proportions
⚖️ Unit 6 - Inference for Categorical Data: Proportions
6.0Unit 6 Overview: Inference for Categorical Data: Proportions
6.1Introducing Statistics: Why Be Normal?
6.2Constructing a Confidence Interval for a Population Proportion
6.3Justifying a Claim Based on a Confidence Interval for a Population Proportion
6.4Setting Up a Test for a Population Proportion
6.6Concluding a Test for a Population Proportion
6.7Potential Errors When Performing Tests
6.8Confidence Intervals for the Difference of Two Proportions
6.9Justifying a Claim Based on a Confidence Interval for a Difference of Population Proportions
6.10Setting Up a Test for the Difference of Two Population Proportions
😼 Unit 7 - Inference for Qualitative Data: Means
7.1Introducing Statistics: Should I Worry About Error?
7.2Constructing a Confidence Interval for a Population Mean
7.3Justifying a Claim About a Population Mean Based on a Confidence Interval
7.4Setting Up a Test for a Population Mean
7.5Carrying Out a Test for a Population Mean
7.6Confidence Intervals for the Difference of Two Means
7.7Justifying a Claim About the Difference of Two Means Based on a Confidence Interval
7.8Setting Up a Test for the Difference of Two Population Means
7.9Carrying Out a Test for the Difference of Two Population Means
✳️ Unit 8 Inference for Categorical Data: Chi-Square
📈 Unit 9 - Inference for Quantitative Data: Slopes
🧐 Multiple Choice Questions (MCQs)
Is AP Statistics Hard? Is AP Statistics Worth Taking?
Best Quizlet Decks for AP Statistics
⏱️ 3 min read
June 3, 2020
Sometimes, the least squares regression model may not be the best for representing a data set. We’re going to list some reasons why.
An influential point is a point that when added, significantly changes the regression model, whether by affecting the slope, y-intercept, or correlation. There are two types: outliers and high-leverage points, which are both shown in this graph.
Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.
An outlier is a point in which the y-value is far away from the rest of the points, that is, it has a high-magnitude residual. These points heavily reduce the correlation of the scatterplot and can occasionally change the y-intercept of a regression line. Child 19 on the scatterplot above is an outlier.
An high-leverage point is a point in which the x-value is far away from the rest of the points. These points pull the regression line towards this point, and thus can significantly change the slope of the line. It can occasionally change the y-intercept of a regression line. Child 18 on the scatterplot above is a high-leverage point.
Sometimes, a linear model is not a good fit for a set of data, and thus it is better to use a nonlinear model. The types that we have to know for this class are exponential and power regression models. (There is also polynomial regression, but that requires knowledge of linear algebra, which is beyond the scope of this course.)
To use exponential and power regression, we will need to transform the data to linearize it (However, most calculators have options to automatically calculate this for you.).
Exponential models are of the form ŷ=ab^x. We transform this by taking the natural logarithm of both sides. With logarithm properties, we get ln(ŷ)=ln(a)+ln(b)x. This means that the relationship between ln(ŷ) and x is linear, so we find the LSRL of this transformed data with the y-intercept being a* and the slope being b*. To find a and b, we use:
image courtesy of: codecogs.com
image courtesy of: codecogs.com
Power models are of the form ŷ=ax^b. Like exponential models, we also take the natural logarithm of both sides, and with manipulation, we get ln(ŷ)=ln(a)+bln(x). This time, the relationship between ln(ŷ) and ln(x) is linear. With the LSRL of the transformer data again having y-intercept a* and slope b*, we have:
image courtesy of: codecogs.com
To evaluate which transformation to use, we check both the residual plots of the transformed data and its R^2. We pick the right model by seeing whether the residuals are randomly scattered and not curved and also whether the R^2 is close to 1. By the way, the R^2 is interpreted as the percent of variation in the response variable that can be explained by a power/exponential model relative to the explanatory variable, which is very similar to its linear counterpart. If the conditions above aren’t met, then there may be another model that may work that we haven’t learned or there are influential points skewing the data set, which is more likely.
To summarize, if our data appears to be an exponential model, we need to take the natural log (or any other base log) of our y coordinates. If our data appears to be a power model, such as a quadratic or cubic function, we need to take the log of both our x and y coordinates.
🎥Watch: AP Stats - Exploring Two-Variable Data
2550 north lake drive
milwaukee, wi 53211
92% of Fiveable students earned a 3 or higher on their 2020 AP Exams.
*ap® and advanced placement® are registered trademarks of the college board, which was not involved in the production of, and does not endorse, this product.
© fiveable 2020 | all rights reserved.