๐Ÿ“š

All Subjects

ย >ย 

๐Ÿ“Šย 

AP Stats

ย >ย 

โœŒ๏ธ

Unit 2

2.5 Correlation

2 min readโ€ขjune 11, 2020

Peter Cao


What is Correlation?

Correlation is when two variables are related to each other, and this is numerically represented with the correlation coefficient, which in stats we denote as r. The correlation coefficient shows the degree to which there is a linear correlation between the two variables, that is, how close the points are to forming a line. It can be positive or negative and this is the same as the direction of the scatterplot. The coefficient takes a value between -1 and 1, where r=-1 means that the points fall exactly on an decreasing line while r=1 means that the points fall exactly on a increasing line. A correlation coefficient of 0 means that there is no correlation between the data points.

Examples

Here are some scatterplots and their values of r:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-24%20at%207.51-KGen8QXyUFXW.png?alt=media&token=e90556cd-49ec-4c6d-ac07-1ffe900b1dd0

image courtesy of: math.nayland.school.nz

Also, there are a few things to keep in mind about correlation.ย 

  • Even if r has a high magnitude, the relationship may not be linear, but instead it may be curved. We will discuss this more in later sections.

  • A high magnitude of correlation does not imply causation.

  • The correlation coefficient is not resistant to outliers, which makes sense, given that the formula that we shall learn uses the mean and standard deviation, which by themselves are not resistant.

Calculating the Correlation Coefficient

To find the value of r, we have this formula that is found on the formula sheet:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-24%20at%207.53-wJpfoxjCtbL2.png?alt=media&token=f5b31b81-9120-44ba-96fc-5f2848adc13a

Although this may seem like a complicated formula, itโ€™s not that bad to understand (but harder to compute) To find r, first find the mean and standard deviations of both the x and y variables. Then, for each data point, multiply the x and y z-scores for that point. Finally, add all the individual products up and divide by the number of data points minus 1.

You will seldom need to do this by hand, and most graphing calculators can easily find this. On the most common graphing calculator used in AP Stats (TI-84), you will enter your data into L1 and L2, go to Stats>Calc>LinReg like below:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.11-F3qYYS7Efe7p.png?alt=media&token=b023f6cf-2396-4032-b8a8-b19dadbac5c5

To be sure that you get the r-value, verify that "Stats Diagnostics" is on via MODE.

๐ŸŽฅWatch: AP Stats - Scatterplots and Association

Resources:

Was this guide helpful?

Join us on Discord

Thousands of students are studying with us for the AP Statistics exam.

join now

Browse Study Guides By Unit

โœ๏ธ
Blogs

โœ๏ธ
Free Response Questions (FRQs)

๐Ÿง
Multiple Choice Questions (MCQs)

๐Ÿ‘†
Unit 1: Exploring One-Variable Data

๐Ÿ”Ž
Unit 3: Collecting Data

๐ŸŽฒ
Unit 4: Probability, Random Variables, and Probability Distributions

๐Ÿ“Š
Unit 5: Sampling Distributions

โš–๏ธ
Unit 6: Inference for Categorical Data: Proportions

๐Ÿ˜ผ
Unit 7: Inference for Qualitative Data: Means

โœณ๏ธ
Unit 8: Inference for Categorical Data: Chi-Square

๐Ÿ“ˆ
Unit 9: Inference for Quantitative Data: Slopes

Play this on HyperTyper

Practice your typing skills while reading Correlation

Start Game