Fiveable

๐Ÿ“ŠAP Statistics Unit 2 Review

QR code for AP Statistics practice questions

2.5 Correlation

2.5 Correlation

Written by the Fiveable Content Team โ€ข Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examโ€ขWritten by the Fiveable Content Team โ€ข Last updated June 2026
๐Ÿ“ŠAP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

Correlation (rr) measures the strength and direction of the linear relationship between two quantitative variables. It always falls between โˆ’1-1 and 11, where values near โˆ’1-1 or 11 mean a strong linear pattern, 0 means no linear association, and a strong rr still does not prove that one variable causes the other.

Correlation AP Stats Summary

In AP Stats 2.5, correlation is the value rr that describes the direction and strength of a linear association between two quantitative variables. Positive rr means the variables tend to increase together, negative rr means one tends to decrease as the other increases, and values closer to โˆ’1-1 or 11 show a stronger linear pattern.

The exam usually asks you to interpret rr in context, estimate it from a scatterplot, or explain its limits. Remember that rr is unit-free, is always between โˆ’1-1 and 11, only measures linear association, and does not prove causation.

Why This Matters for the AP Statistics Exam

Correlation is one of the core tools in Unit 2, Exploring Two-Variable Data, which makes up about 5 to 7 percent of the exam. On both multiple-choice and free-response questions, you may need to estimate r from a scatterplot, interpret what r tells you about a relationship, or explain why a correlation does not mean one variable causes another. Reserve the word "correlation" for relationships between two quantitative variables, and connect your numbers to the real context of the problem instead of speaking in general terms.

This topic also sets up later work in the unit. Correlation feeds directly into the slope of the least-squares regression line and into r squared, so getting comfortable with r now pays off when you reach regression.

Key Takeaways

  • r measures the strength and direction of a linear relationship between two quantitative variables only.
  • r is unit-free and always falls between -1 and 1, inclusive.
  • r = 1 or r = -1 means a perfect linear pattern; r = 0 means no linear association.
  • A high magnitude of r does not guarantee that a linear model fits well; the pattern could be curved.
  • r is not resistant, so outliers can pull it up or down.
  • Correlation does not imply causation, even when r is close to 1 or -1.

What Correlation Measures

The correlation coefficient r tells you two things at once: the direction of a linear relationship and how strong that linear pattern is.

  • Direction: a positive r means that as one variable increases, the other tends to increase. A negative r means that as one variable increases, the other tends to decrease. This matches the direction you see in a scatterplot.
  • Strength: the closer r is to 1 or -1, the more tightly the points cluster around a straight line. The closer r is to 0, the more scattered the points are.

A few important limits:

  • r only measures the linear relationship. A strong curved pattern can still give a low r, so a value near 0 does not mean "no relationship," only "no linear relationship."
  • A correlation coefficient close to 1 or -1 does not automatically mean a line is the right model. Always check the scatterplot.
  • r is unit-free. If you change from inches to centimeters or pounds to kilograms, r stays the same.
  • r is not resistant to outliers. The formula uses means and standard deviations, which are not resistant, so a single unusual point can shift r noticeably.

And the big one: correlation does not imply causation. A real or apparent relationship between two variables does not prove that changes in one cause changes in the other. A lurking variable could be driving both.

Calculating the Correlation Coefficient

The formula on the formula sheet is:

r=1nโˆ’1โˆ‘(xiโˆ’xห‰sx)(yiโˆ’yห‰sy)r = \frac{1}{n-1}\sum\left(\frac{x_i-\bar{x}}{s_x}\right)\left(\frac{y_i-\bar{y}}{s_y}\right)

It looks busy, but the steps are straightforward:

  1. Find the mean and standard deviation of both the x-values and the y-values.

  2. For each data point, convert x and y to z-scores: (xi - xฬ„)/sx and (yi - ศณ)/sy.

  3. Multiply the two z-scores for each point.

  4. Add up all the products and divide by n - 1.

You will rarely compute this by hand. On the TI-84, enter your data into L1 and L2, then go to STAT > CALC > LinReg. To make sure r actually shows up, turn on "Stat Diagnostics" under MODE.

How to Use This on the AP Statistics Exam

MCQ

  • Practice estimating r by eye from a scatterplot. Strong, tightly clustered points near a line means r close to 1 or -1; a loose cloud means r close to 0.
  • Watch for sign. A downward trend must give a negative r, and an upward trend must give a positive r.
  • Remember that a curved but tight pattern can still have a small r, since r only captures the linear part.

Free Response

  • Interpret r in context, not just as a number. Instead of "strong positive correlation," tie it to the variables: longer animals tend to have higher weights, for example.
  • Use careful language like "tend to" and "on average" to show you understand that the relationship is not exact.
  • If asked about cause and effect, state clearly that correlation does not prove causation and, when relevant, mention a possible lurking variable.

Common Trap

  • Do not call something "correlation" unless both variables are quantitative. Save that word for two-quantitative-variable relationships.
  • Do not assume a high r means a line is the best model. Confirm with the scatterplot or a residual plot before committing to linear.

Practice Problems

(1) A study was conducted to examine the relationship between hours of exercise per week and body mass index (BMI). The scatterplot shows the results for a sample of 25 individuals.

Based on the scatterplot, which of the following statements is true?

(A) There is a strong positive correlation between hours of exercise per week and BMI.

(B) There is a strong negative correlation between hours of exercise per week and BMI.

(C) There is a moderate positive correlation between hours of exercise per week and BMI.

(D) There is a moderate negative correlation between hours of exercise per week and BMI.

(E) There is no correlation between hours of exercise per week and BMI.

(2) TRUE or FALSE

  • A scatterplot is a graphical representation of the relationship between two variables.
  • A correlation coefficient of 1 indicates a strong positive correlation between two variables.
  • A correlation coefficient of -1 indicates a strong positive correlation between two variables.
  • A correlation coefficient of 0 indicates no correlation between two variables.
  • The correlation coefficient only measures linear relationships between two variables.
  • The correlation coefficient indicates the strength and direction of the relationship between two variables.
  • The correlation coefficient indicates the cause and effect relationship between two variables.
  • Correlation implies causation, meaning that if two variables are correlated, one variable must cause the other.
  • A scatterplot can show nonlinear relationships between two variables.
  • A scatterplot can be used to predict the value of one variable based on the value of the other variable.

Answers

(1) The correct answer is (D) There is a moderate negative correlation between hours of exercise per week and BMI. The scatterplot shows that as hours of exercise per week increase, BMI tends to decrease. The relationship is not perfectly linear, but there is a clear downward trend. You could calculate r to quantify the strength of this relationship.

(2) T, T, F, T, T, T, F, F, T, T

Common Misconceptions

  • "r = 0 means no relationship." It only means no linear relationship. The variables could still have a strong curved pattern.
  • "A correlation near 1 or -1 proves a line fits." A high magnitude does not guarantee a linear model is appropriate. Check the scatterplot.
  • "Correlation proves causation." A strong r never proves that one variable causes the other. A lurking variable may explain both.
  • "r changes if I change units." r is unit-free, so switching between inches and centimeters or pounds and kilograms leaves it unchanged.
  • "Outliers do not affect r much." r is not resistant, so a single extreme point can shift it noticeably.
  • "You can use correlation for any two variables." r is only for two quantitative variables, not categorical ones.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

causation

A relationship where changes in one variable directly cause changes in another variable.

correlation

A numerical measure (r) that describes the strength and direction of a linear relationship between two variables, ranging from -1 to 1.

linear model

A mathematical representation of the linear relationship between two variables.

linear relationship

A relationship between two variables that can be described by a straight line.

quantitative variable

A variable that is measured numerically and can take on a range of values, allowing for mathematical operations and statistical analysis.

Frequently Asked Questions

What is correlation in AP Stats?

Correlation is the value r that gives the direction and strength of the linear association between two quantitative variables. Positive r means the variables tend to increase together, while negative r means one tends to decrease as the other increases.

What does the correlation coefficient r tell you?

The correlation coefficient r tells you direction and strength for a linear relationship. Values close to 1 or -1 show a stronger linear association, while r = 0 means no linear association.

What is the formula for correlation?

The AP Stats formula is r = (1/(n-1)) times the sum of the products of the x and y z-scores. In practice, you usually determine r with technology rather than calculating the formula by hand.

Can r be greater than 1 or less than -1?

No. The correlation coefficient is always between -1 and 1, inclusive. r = 1 and r = -1 are perfect linear associations, and r = 0 means no linear association.

Does correlation imply causation?

No. A perceived or real relationship between two variables does not prove that changes in one cause changes in the other. A lurking variable may explain the association.

Does a high correlation mean a linear model is appropriate?

Not necessarily. A correlation close to 1 or -1 does not guarantee that a linear model fits well. Always check the scatterplot or residual plot for curvature and unusual points.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal โ†’ update your plan โ†’ choose Yearlyโ†’ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot