Correlation Coefficient

The correlation coefficient, r, is a unit-free number between -1 and 1 that measures the direction and strength of the linear association between two quantitative variables, where r = 1 or r = -1 means a perfect linear relationship and r = 0 means no linear association (AP Stats Topic 2.5).

Verified for the 2027 AP Statistics examLast updated June 2026

What is the Correlation Coefficient?

The correlation coefficient, r, puts a single number on what a scatterplot shows you visually. It tells you two things about the linear relationship between two quantitative variables. First, direction. A positive r means the variables tend to rise together, and a negative r means one tends to fall as the other rises. Second, strength. The closer r is to 1 or -1, the more tightly the points hug a straight line. An r of exactly 1 or -1 means every point sits perfectly on a line, and r = 0 means there is no linear association at all.

The formula is r = (1/(n-1)) × Σ[(xi - x̄)/sx][(yi - ȳ)/sy], which is really just averaging the products of the z-scores of x and y. In practice you'll almost always get r from your calculator. Because r is built from z-scores, it has no units, and it doesn't change if you rescale the data (multiply every value by 3 and r stays exactly the same). Two warnings the CED bakes in directly. A value of r close to 1 or -1 does not guarantee a linear model is appropriate (always check the scatterplot and residuals), and correlation does not imply causation.

Why the Correlation Coefficient matters in AP Statistics

Correlation lives in Topic 2.5 of Unit 2 (Exploring Two-Variable Data) and supports learning objectives 2.5.A (determine the correlation for a linear relationship) and 2.5.B (interpret it). But r doesn't stay in its own topic. It's wired directly into least-squares regression in Topic 2.8, where the slope formula b = r(s_y/s_x) and the coefficient of determination r² both depend on it (2.8.A). It also feeds the big-picture question from Topic 2.1 about whether an apparent pattern in data is real or random, a question that comes back with full inference machinery in Unit 9 when you test whether a population slope is actually different from zero. If you can compute, interpret, and correctly limit the conclusions you draw from r, you've got one of the most-tested skills in the course.

How the Correlation Coefficient connects across the course

Scatterplot (Unit 2)

A scatterplot is the picture and r is the number. Topic 2.4 has you describe form, direction, strength, and unusual features visually, and r quantifies the direction and strength parts. But r only measures LINEAR strength, so a curved pattern can have a high r and still be a bad fit for a line. Always look at the plot before trusting the number.

Least Squares Method (Unit 2)

The slope of the least-squares regression line is b = r(s_y/s_x), so the correlation literally lives inside the regression equation. Notice what this means. The sign of r and the sign of the slope always match, and a stronger correlation (holding the standard deviations fixed) means a steeper line.

Coefficient of Determination (Unit 2)

Square the correlation and you get r², the coefficient of determination, which gives the percent of variation in y explained by the linear relationship with x. Going from r to r² is easy, but going backwards requires care because r² = 0.64 could come from r = 0.8 or r = -0.8. You need the scatterplot or slope to know the sign.

Inference for Slopes (Unit 9)

Unit 2's correlation describes a sample. Unit 9 asks whether the linear relationship is real in the population by testing the slope. A sample r far from 0 is the descriptive hint, and the t-test for slope is the formal answer to Topic 2.1's question of whether the pattern could just be random noise.

Is the Correlation Coefficient on the AP Statistics exam?

Multiple-choice questions love r's properties. Expect stems asking what happens to r when you rescale the data (answer: nothing, since r is unit-free), what r equals when all points lie exactly on a line with positive slope (r = 1), or how to interpret a value like r = -0.85 (a strong negative linear association). You may also compute r from summary statistics like Σx, Σy, and Σxy. On FRQs, correlation usually appears inside a regression question. The 2017 FRQ on gray wolf length and weight and the 2018 FRQ on checkout times both started from a scatterplot and computer regression output. Common tasks include interpreting r or r² in context, connecting r to the slope, and explaining why a strong correlation doesn't prove causation. The graders want context (the actual variables named) and the word "linear" in your interpretation. "Strong negative association" loses credit; "strong negative linear association between length and weight" earns it.

The Correlation Coefficient vs Coefficient of Determination (r²)

The correlation r measures the direction and strength of a linear relationship and runs from -1 to 1. The coefficient of determination r² is its square, runs from 0 to 1, and answers a different question: what percent of the variation in the response variable is explained by the linear model? They have different interpretations and you can't swap them. An r of -0.9 means a strong negative linear association, while the matching r² of 0.81 means 81% of the variation in y is explained by the line. Also, r² alone can't tell you the direction, since squaring erases the sign.

Key things to remember about the Correlation Coefficient

  • The correlation coefficient r is always between -1 and 1, where the sign gives the direction of the linear association and the distance from 0 gives its strength.

  • r is unit-free and doesn't change when you multiply or rescale the data, because it's calculated from z-scores.

  • An r close to 1 or -1 does not prove a linear model is appropriate; you still need to check the scatterplot and residual plot.

  • Correlation does not imply causation, so a strong r between two variables never lets you conclude that one variable causes the other.

  • The correlation connects directly to regression through the slope formula b = r(s_y/s_x) and through r², the coefficient of determination.

  • When interpreting r on the exam, always state direction, strength, the word 'linear,' and the variables in context.

Frequently asked questions about the Correlation Coefficient

What is the correlation coefficient in AP Stats?

It's the number r, between -1 and 1, that measures the direction and strength of the linear relationship between two quantitative variables. It's covered in Topic 2.5 of Unit 2 and is usually computed with technology rather than by hand.

Does a correlation of 0.95 mean a linear model is appropriate?

No. The CED states explicitly that an r close to 1 or -1 does not necessarily mean a linear model fits. A strongly curved pattern can still produce a high r, so you have to check the scatterplot and residual plot before fitting a line.

Does a strong correlation mean one variable causes the other?

No, and AP Stats tests this constantly. Correlation does not imply causation, because lurking variables or coincidence can create an association. Only a well-designed randomized experiment supports a causal conclusion.

What's the difference between r and r squared?

r is the correlation, running from -1 to 1, and describes the direction and strength of the linear association. r² is the coefficient of determination, running from 0 to 1, and gives the proportion of variation in y explained by the linear model. An r of -0.8 gives an r² of 0.64, but r² alone can't tell you the relationship is negative.

What happens to r if you multiply all the data values by a constant?

Nothing. Because r is built from standardized values, rescaling or shifting the data leaves it unchanged. If r = 0.4 and you multiply every value by 3, r is still 0.4, which is a classic multiple-choice question.