Coefficient of Determination in AP Statistics

The coefficient of determination, r², is the proportion of the variation in the response variable that is explained by the least-squares regression line using the explanatory variable. In simple linear regression, r² is literally the square of the correlation r, and it ranges from 0 to 1.

Verified for the 2027 AP Statistics examLast updated June 2026

What is the Coefficient of Determination?

The coefficient of determination, written r², answers one question. Out of all the variation you see in the y-values, how much of it does your regression line actually account for? If r² = 0.81 in a model predicting weight from height, then 81% of the variation in weight is explained by the linear relationship with height. The other 19% comes from everything else (other variables, natural scatter, measurement noise).

In simple linear regression, r² is exactly what it looks like, the correlation coefficient r squared. That's why it lives in Topic 2.8 right alongside the slope formula b = r(s_y/s_x). Because r is always between -1 and 1, r² is always between 0 and 1. An r² near 1 means the points hug the line tightly and the model explains most of the variation; an r² near 0 means the line explains almost nothing. Think of it as a report card for your regression line, graded on a 0-to-100% scale.

Why the Coefficient of Determination matters in AP Statistics

This term lives in Unit 2: Exploring Two-Variable Data, specifically Topic 2.8: Least Squares Regression, and supports learning objective 2.8.A (estimate parameters for the least-squares regression line model). The CED's essential knowledge states directly that in simple linear regression, r² is the square of the correlation r and is called the coefficient of determination. It's one of the three numbers AP Stats expects you to interpret in context from regression output, along with the slope and y-intercept (that interpretation skill is 2.8.B). Computer output on the exam almost always reports "R-Sq," so being able to read it, interpret it, and convert between r and r² is a bread-and-butter Unit 2 skill that resurfaces when regression returns in inference for slopes later in the course.

How the Coefficient of Determination connects across the course

Correlation Coefficient (Unit 2)

These two are mathematically joined at the hip. Square r and you get r²; take the square root of r² and you get r, but you have to recover the sign from the slope or the scatterplot. If r² = 0.64, then r is either 0.8 or -0.8, and the direction of the relationship tells you which.

Residuals (Unit 2)

Residuals are the leftover variation that r² doesn't explain. A high r² means small residuals overall, since the line is capturing most of what's going on in y. The 'unexplained' part of the variation is exactly the scatter you see in a residual plot.

Explained Variance (Unit 2)

This is the concept r² measures. Total variation in y splits into a part the regression line explains and a part it doesn't. r² is the explained slice as a fraction of the whole, which is why you interpret it as a percent.

Regression Line (Unit 2)

r² is the quality score for the least-squares line. The line minimizes the sum of squared residuals, and r² tells you how well that minimizing actually worked for your data. A line can always be fit; r² tells you whether it was worth fitting.

Is the Coefficient of Determination on the AP Statistics exam?

On multiple choice, the classic stem gives you an r² value and asks for the correct interpretation, like a question where r² = 0.81 for height and weight and you must pick the wording about 'percent of variation in weight explained by the linear relationship with height.' Another favorite is the conversion trap. Given r² = 0.64, you're asked about r, and the right answer acknowledges r could be 0.8 or -0.8 without more information (or, given r = -0.5, you compute r² = 0.25). On FRQs, regression questions like 2018 Q1 (checkout times) and 2022 Q1 (bullfrog length and mass) hand you computer output that includes R-Sq, and you may need to interpret it in context or use it to discuss how well the model fits. The grader-safe template is worth memorizing. Say '___% of the variation in [response variable, in context] is explained by the linear relationship with [explanatory variable, in context].' Vague answers like 'the model is 81% accurate' earn no credit.

The Coefficient of Determination vs Correlation Coefficient (r)

r measures the strength AND direction of a linear relationship and runs from -1 to 1. r² measures the proportion of variation in y explained by the model and runs from 0 to 1, with no direction at all. Squaring destroys the sign, so r² = 0.49 could come from r = 0.7 or r = -0.7. Also watch the interpretations. You describe r with words like 'strong, negative, linear,' but you describe r² as a percent of variation explained. Swapping those interpretations is one of the most common point-losers on regression FRQs.

Key things to remember about the Coefficient of Determination

  • r² is the proportion of variation in the response variable (y) that is explained by the linear relationship with the explanatory variable (x).

  • In simple linear regression, r² is exactly the square of the correlation coefficient r, so it always falls between 0 and 1.

  • Going backward from r² to r requires a square root AND the sign of the relationship, which you get from the slope or the scatterplot.

  • The exam-safe interpretation sentence is: 'About ___% of the variation in [y, in context] is explained by the linear relationship with [x, in context].'

  • A high r² means the regression line fits the data well, but it does not prove causation and does not guarantee a linear model is appropriate (check the residual plot for that).

  • Computer output on AP exam questions reports r² as 'R-Sq,' so practice reading it straight off the printout.

Frequently asked questions about the Coefficient of Determination

What is the coefficient of determination in AP Stats?

It's r², the proportion of variation in the response variable explained by the least-squares regression line. If r² = 0.81, then 81% of the variation in y is explained by the linear relationship with x, and the remaining 19% is unexplained.

Does a high r² mean x causes y?

No. r² only measures how well a line fits observational data; it says nothing about causation. Even with r² = 0.95, you can't claim x causes y unless the data came from a well-designed experiment (Unit 3).

What's the difference between r and r²?

r is the correlation coefficient, which runs from -1 to 1 and tells you the strength and direction of a linear relationship. r² runs from 0 to 1 and tells you the percent of variation in y the model explains. If r = -0.5, then r² = 0.25, meaning 25% of the variation in y is explained.

Can you find r from r²?

Almost. Take the square root of r², but then you need the sign of the relationship, which comes from the slope or the scatterplot. If r² = 0.64, r is either 0.8 or -0.8, and a negative slope means r = -0.8.

Does r² = 1 mean the regression line is perfect?

It means every point falls exactly on the line, so 100% of the variation in y is explained and all residuals are zero. In real data that almost never happens, and even a high r² doesn't confirm a linear model is appropriate (a curved residual plot can hide behind a big r²).