The correlation coefficient, r, is a unit-free number between -1 and 1 that measures the direction and strength of the linear association between two quantitative variables, where r = 1 or r = -1 means a perfect linear relationship and r = 0 means no linear association (AP Stats Topic 2.5).
The correlation coefficient, r, puts a single number on what a scatterplot shows you visually. It tells you two things about the linear relationship between two quantitative variables. First, direction. A positive r means the variables tend to rise together, and a negative r means one tends to fall as the other rises. Second, strength. The closer r is to 1 or -1, the more tightly the points hug a straight line. An r of exactly 1 or -1 means every point sits perfectly on a line, and r = 0 means there is no linear association at all.
The formula is r = (1/(n-1)) × Σ[(xi - x̄)/sx][(yi - ȳ)/sy], which is really just averaging the products of the z-scores of x and y. In practice you'll almost always get r from your calculator. Because r is built from z-scores, it has no units, and it doesn't change if you rescale the data (multiply every value by 3 and r stays exactly the same). Two warnings the CED bakes in directly. A value of r close to 1 or -1 does not guarantee a linear model is appropriate (always check the scatterplot and residuals), and correlation does not imply causation.
Correlation lives in Topic 2.5 of Unit 2 (Exploring Two-Variable Data) and supports learning objectives 2.5.A (determine the correlation for a linear relationship) and 2.5.B (interpret it). But r doesn't stay in its own topic. It's wired directly into least-squares regression in Topic 2.8, where the slope formula b = r(s_y/s_x) and the coefficient of determination r² both depend on it (2.8.A). It also feeds the big-picture question from Topic 2.1 about whether an apparent pattern in data is real or random, a question that comes back with full inference machinery in Unit 9 when you test whether a population slope is actually different from zero. If you can compute, interpret, and correctly limit the conclusions you draw from r, you've got one of the most-tested skills in the course.
Keep studying AP Statistics Unit 2
Scatterplot (Unit 2)
A scatterplot is the picture and r is the number. Topic 2.4 has you describe form, direction, strength, and unusual features visually, and r quantifies the direction and strength parts. But r only measures LINEAR strength, so a curved pattern can have a high r and still be a bad fit for a line. Always look at the plot before trusting the number.
Least Squares Method (Unit 2)
The slope of the least-squares regression line is b = r(s_y/s_x), so the correlation literally lives inside the regression equation. Notice what this means. The sign of r and the sign of the slope always match, and a stronger correlation (holding the standard deviations fixed) means a steeper line.
Coefficient of Determination (Unit 2)
Square the correlation and you get r², the coefficient of determination, which gives the percent of variation in y explained by the linear relationship with x. Going from r to r² is easy, but going backwards requires care because r² = 0.64 could come from r = 0.8 or r = -0.8. You need the scatterplot or slope to know the sign.
Inference for Slopes (Unit 9)
Unit 2's correlation describes a sample. Unit 9 asks whether the linear relationship is real in the population by testing the slope. A sample r far from 0 is the descriptive hint, and the t-test for slope is the formal answer to Topic 2.1's question of whether the pattern could just be random noise.
Multiple-choice questions love r's properties. Expect stems asking what happens to r when you rescale the data (answer: nothing, since r is unit-free), what r equals when all points lie exactly on a line with positive slope (r = 1), or how to interpret a value like r = -0.85 (a strong negative linear association). You may also compute r from summary statistics like Σx, Σy, and Σxy. On FRQs, correlation usually appears inside a regression question. The 2017 FRQ on gray wolf length and weight and the 2018 FRQ on checkout times both started from a scatterplot and computer regression output. Common tasks include interpreting r or r² in context, connecting r to the slope, and explaining why a strong correlation doesn't prove causation. The graders want context (the actual variables named) and the word "linear" in your interpretation. "Strong negative association" loses credit; "strong negative linear association between length and weight" earns it.
The correlation r measures the direction and strength of a linear relationship and runs from -1 to 1. The coefficient of determination r² is its square, runs from 0 to 1, and answers a different question: what percent of the variation in the response variable is explained by the linear model? They have different interpretations and you can't swap them. An r of -0.9 means a strong negative linear association, while the matching r² of 0.81 means 81% of the variation in y is explained by the line. Also, r² alone can't tell you the direction, since squaring erases the sign.
The correlation coefficient r is always between -1 and 1, where the sign gives the direction of the linear association and the distance from 0 gives its strength.
r is unit-free and doesn't change when you multiply or rescale the data, because it's calculated from z-scores.
An r close to 1 or -1 does not prove a linear model is appropriate; you still need to check the scatterplot and residual plot.
Correlation does not imply causation, so a strong r between two variables never lets you conclude that one variable causes the other.
The correlation connects directly to regression through the slope formula b = r(s_y/s_x) and through r², the coefficient of determination.
When interpreting r on the exam, always state direction, strength, the word 'linear,' and the variables in context.
It's the number r, between -1 and 1, that measures the direction and strength of the linear relationship between two quantitative variables. It's covered in Topic 2.5 of Unit 2 and is usually computed with technology rather than by hand.
No. The CED states explicitly that an r close to 1 or -1 does not necessarily mean a linear model fits. A strongly curved pattern can still produce a high r, so you have to check the scatterplot and residual plot before fitting a line.
No, and AP Stats tests this constantly. Correlation does not imply causation, because lurking variables or coincidence can create an association. Only a well-designed randomized experiment supports a causal conclusion.
r is the correlation, running from -1 to 1, and describes the direction and strength of the linear association. r² is the coefficient of determination, running from 0 to 1, and gives the proportion of variation in y explained by the linear model. An r of -0.8 gives an r² of 0.64, but r² alone can't tell you the relationship is negative.
Nothing. Because r is built from standardized values, rescaling or shifting the data leaves it unchanged. If r = 0.4 and you multiply every value by 3, r is still 0.4, which is a classic multiple-choice question.