R² in AP Statistics

In AP Stats, r² (the coefficient of determination) is the square of the correlation r, and it measures the proportion of variation in the response variable (y) that is explained by the linear relationship with the explanatory variable (x).

Verified for the 2027 AP Statistics examLast updated June 2026

What is ?

r², also called the coefficient of determination, is exactly what it sounds like in simple linear regression. You take the correlation coefficient r and square it. The result tells you what fraction of the variation in the response variable is explained by the least-squares regression line using the explanatory variable. If r² = 0.81, then 81% of the variation in y is explained by the linear relationship with x. The other 19% is variation the line doesn't account for.

Here's the intuition. The y-values in your data bounce around their mean. Some of that bouncing is predictable (taller people tend to weigh more), and some is just scatter. r² tells you how much of the total bouncing your line actually captures. An r² close to 1 means the line explains most of the variation; an r² close to 0 means knowing x barely helps you predict y. What r² does NOT tell you is how many points sit exactly on the line. That misreading is one of the most common wrong answers on AP multiple choice.

Why matters in AP® Statistics

r² lives in Unit 2: Exploring Two-Variable Data, specifically Topic 2.8 (Least Squares Regression) and Topic 2.9 (Analyzing Departures from Linearity). It supports learning objective AP Stats 2.8.A, where the essential knowledge states directly that r² is the square of the correlation r and is called the coefficient of determination. It also shows up under AP Stats 2.9.B, where r² moving closer to 1 after transforming data is evidence that the transformed model fits better. The exam loves r² because interpreting it correctly requires precise statistical language. One sloppy word ('36% of the points' instead of '36% of the variation') turns a right answer into a wrong one. Mastering the interpretation sentence here pays off again when you assess model fit throughout regression problems.

How connects across the course

Correlation Coefficient (Unit 2)

r² is literally r squared, but squaring loses information. The sign disappears, so r² = 0.36 could come from r = 0.6 or r = -0.6. You need the scatterplot or slope to recover the direction of the relationship.

Transformations and Departures from Linearity (Unit 2)

In Topic 2.9, r² becomes a model-comparison tool. If you log-transform y and r² jumps closer to 1 (along with a more random residual plot), that's evidence the transformed model predicts better than the original linear one. The 2022 FRQ on bullfrog length and mass tested exactly this kind of model choice.

Influential Point (Unit 2)

A single influential point can drag r, and therefore r², up or down dramatically. A high r² doesn't guarantee a good model if one high-leverage point is doing all the work, which is why you always check the scatterplot and residual plot too.

Response Variable (Unit 2)

Every correct r² interpretation names the response variable in context. It's the variation in y, the thing you're predicting, that gets explained. 'Explains 81% of the variation in weight' scores; 'explains 81% of the data' doesn't.

Is on the AP® Statistics exam?

On multiple choice, r² questions are interpretation traps. A typical stem gives you something like ŷ = 62 + 3.8x with r² = 0.36 and asks which interpretation is correct. The wrong answers say things like '36% of the points fall on the line' or '36% of test scores are predicted by study time.' The correct answer always follows the template: '36% of the variation in [response variable, in context] is explained by the linear relationship with [explanatory variable].' On FRQs, r² appears in regression output you have to read and use. The 2018 FRQ gave computer output for checkout time versus number of customers, and the 2022 FRQ used r² as evidence when comparing a linear model to a transformed model for bullfrog data. Know the template sentence cold, always include context, and remember that comparing r² values between models (especially after a transformation) is a legitimate way to argue one model fits better.

vs Correlation coefficient (r)

r measures the direction and strength of a linear relationship and runs from -1 to 1. r² measures the proportion of variation in y explained by the model and runs from 0 to 1. Squaring kills the sign, so r² alone can't tell you whether the relationship is positive or negative. Also watch the interpretations: r is about strength and direction, while r² is about explained variation. Don't say r² is 'strong' and don't say r 'explains' anything.

Key things to remember about

  • r², the coefficient of determination, is the square of the correlation r and measures the proportion of variation in the response variable explained by the linear relationship with the explanatory variable.

  • The full-credit interpretation sentence is: '[r² as a percent] of the variation in [response variable, in context] is explained by the linear relationship with [explanatory variable].'

  • r² does NOT mean that percentage of points fall on the regression line, and it does not measure how many predictions are correct.

  • Because squaring erases the sign, r² = 0.36 could come from r = 0.6 or r = -0.6, so you can't determine direction from r² alone.

  • In Topic 2.9, an r² closer to 1 after transforming data (paired with a more random residual plot) is evidence the transformed model is a better fit.

  • A high r² can be inflated by a single influential point, so always check the scatterplot and residual plot before trusting the model.

Frequently asked questions about

What is r² in AP Stats?

r² is the coefficient of determination, the square of the correlation coefficient r. It gives the proportion of variation in the response variable that is explained by the linear relationship with the explanatory variable, per AP Stats 2.8.A.

Does r² = 0.36 mean 36% of the points fall on the regression line?

No, and this is the classic trap answer. r² = 0.36 means 36% of the variation in the response variable is explained by the linear relationship with the explanatory variable. It says nothing about how many points sit on the line.

How is r² different from r?

r measures direction and strength of a linear relationship (from -1 to 1), while r² measures the proportion of variation in y explained by the model (from 0 to 1). Squaring removes the sign, so r² = 0.81 could come from r = 0.9 or r = -0.9.

How do I interpret r² on an AP Stats FRQ?

Use the template: '[percent] of the variation in [response variable, in context] is explained by the linear relationship with [explanatory variable].' For r² = 0.81 relating height to weight, say 81% of the variation in weight is explained by the linear relationship with height.

Does a high r² prove the linear model is a good fit?

Not by itself. A nonlinear pattern or a single influential point can still produce a high r², so you also need a residual plot with no pattern. In Topic 2.9, you compare r² values alongside residual plots to decide whether a transformed model fits better, as the 2022 FRQ on bullfrog data required.