The coefficient of determination, r², is the proportion of the variation in the response variable that is explained by the least-squares regression line using the explanatory variable. In simple linear regression, r² is literally the square of the correlation r, and it ranges from 0 to 1.
The coefficient of determination, written r², answers one question. Out of all the variation you see in the y-values, how much of it does your regression line actually account for? If r² = 0.81 in a model predicting weight from height, then 81% of the variation in weight is explained by the linear relationship with height. The other 19% comes from everything else (other variables, natural scatter, measurement noise).
In simple linear regression, r² is exactly what it looks like, the correlation coefficient r squared. That's why it lives in Topic 2.8 right alongside the slope formula b = r(s_y/s_x). Because r is always between -1 and 1, r² is always between 0 and 1. An r² near 1 means the points hug the line tightly and the model explains most of the variation; an r² near 0 means the line explains almost nothing. Think of it as a report card for your regression line, graded on a 0-to-100% scale.
This term lives in Unit 2: Exploring Two-Variable Data, specifically Topic 2.8: Least Squares Regression, and supports learning objective 2.8.A (estimate parameters for the least-squares regression line model). The CED's essential knowledge states directly that in simple linear regression, r² is the square of the correlation r and is called the coefficient of determination. It's one of the three numbers AP Stats expects you to interpret in context from regression output, along with the slope and y-intercept (that interpretation skill is 2.8.B). Computer output on the exam almost always reports "R-Sq," so being able to read it, interpret it, and convert between r and r² is a bread-and-butter Unit 2 skill that resurfaces when regression returns in inference for slopes later in the course.
Keep studying AP Statistics Unit 2
Correlation Coefficient (Unit 2)
These two are mathematically joined at the hip. Square r and you get r²; take the square root of r² and you get r, but you have to recover the sign from the slope or the scatterplot. If r² = 0.64, then r is either 0.8 or -0.8, and the direction of the relationship tells you which.
Residuals (Unit 2)
Residuals are the leftover variation that r² doesn't explain. A high r² means small residuals overall, since the line is capturing most of what's going on in y. The 'unexplained' part of the variation is exactly the scatter you see in a residual plot.
Explained Variance (Unit 2)
This is the concept r² measures. Total variation in y splits into a part the regression line explains and a part it doesn't. r² is the explained slice as a fraction of the whole, which is why you interpret it as a percent.
Regression Line (Unit 2)
r² is the quality score for the least-squares line. The line minimizes the sum of squared residuals, and r² tells you how well that minimizing actually worked for your data. A line can always be fit; r² tells you whether it was worth fitting.
On multiple choice, the classic stem gives you an r² value and asks for the correct interpretation, like a question where r² = 0.81 for height and weight and you must pick the wording about 'percent of variation in weight explained by the linear relationship with height.' Another favorite is the conversion trap. Given r² = 0.64, you're asked about r, and the right answer acknowledges r could be 0.8 or -0.8 without more information (or, given r = -0.5, you compute r² = 0.25). On FRQs, regression questions like 2018 Q1 (checkout times) and 2022 Q1 (bullfrog length and mass) hand you computer output that includes R-Sq, and you may need to interpret it in context or use it to discuss how well the model fits. The grader-safe template is worth memorizing. Say '___% of the variation in [response variable, in context] is explained by the linear relationship with [explanatory variable, in context].' Vague answers like 'the model is 81% accurate' earn no credit.
r measures the strength AND direction of a linear relationship and runs from -1 to 1. r² measures the proportion of variation in y explained by the model and runs from 0 to 1, with no direction at all. Squaring destroys the sign, so r² = 0.49 could come from r = 0.7 or r = -0.7. Also watch the interpretations. You describe r with words like 'strong, negative, linear,' but you describe r² as a percent of variation explained. Swapping those interpretations is one of the most common point-losers on regression FRQs.
r² is the proportion of variation in the response variable (y) that is explained by the linear relationship with the explanatory variable (x).
In simple linear regression, r² is exactly the square of the correlation coefficient r, so it always falls between 0 and 1.
Going backward from r² to r requires a square root AND the sign of the relationship, which you get from the slope or the scatterplot.
The exam-safe interpretation sentence is: 'About ___% of the variation in [y, in context] is explained by the linear relationship with [x, in context].'
A high r² means the regression line fits the data well, but it does not prove causation and does not guarantee a linear model is appropriate (check the residual plot for that).
Computer output on AP exam questions reports r² as 'R-Sq,' so practice reading it straight off the printout.
It's r², the proportion of variation in the response variable explained by the least-squares regression line. If r² = 0.81, then 81% of the variation in y is explained by the linear relationship with x, and the remaining 19% is unexplained.
No. r² only measures how well a line fits observational data; it says nothing about causation. Even with r² = 0.95, you can't claim x causes y unless the data came from a well-designed experiment (Unit 3).
r is the correlation coefficient, which runs from -1 to 1 and tells you the strength and direction of a linear relationship. r² runs from 0 to 1 and tells you the percent of variation in y the model explains. If r = -0.5, then r² = 0.25, meaning 25% of the variation in y is explained.
Almost. Take the square root of r², but then you need the sign of the relationship, which comes from the slope or the scatterplot. If r² = 0.64, r is either 0.8 or -0.8, and a negative slope means r = -0.8.
It means every point falls exactly on the line, so 100% of the variation in y is explained and all residuals are zero. In real data that almost never happens, and even a high r² doesn't confirm a linear model is appropriate (a curved residual plot can hide behind a big r²).
Connect this key term to the AP exam workflow: review the course, practice questions, and check related study tools.
Review units, study guides, and course resources.
Check this vocabulary in multiple-choice context.
Apply key concepts in written AP responses.
Estimate the exam score you are working toward.
Review the highest-yield facts before practice.
Put the full course together before test day.