Prediction

In AP Statistics, prediction means using a linear regression model ŷ = a + bx to estimate the response variable (ŷ) for a given value of the explanatory variable (x), where a is the y-intercept and b is the slope. Predictions are only reliable within the range of x-values used to build the model.

Verified for the 2027 AP Statistics examLast updated June 2026

What is Prediction?

Prediction is the whole point of building a regression model. Once you have the least-squares regression line ŷ = a + bx, you can plug in any value of the explanatory variable x and get a predicted response value, written ŷ ("y-hat"). The hat matters. It signals that this is the model's estimate, not an actual observed value of y.

There's a catch, and the CED calls it out directly (DAT-1.D.3). If you plug in an x-value outside the range of data used to build the line, you're extrapolating, and the farther you go beyond that range, the less you can trust the prediction. A model built on cars aged 1 to 10 years has no business predicting the price of a 50-year-old car. The line keeps going forever; the real-world pattern usually doesn't.

Why Prediction matters in AP Statistics

Prediction lives in Topic 2.6 (Linear Regression Models) in Unit 2: Exploring Two-Variable Data, and it's exactly what learning objective 2.6.A asks you to do: calculate a predicted response value using a linear regression model. It's also the foundation for almost everything else in regression. Residuals are defined as actual minus predicted (y − ŷ), so you can't interpret a residual without first making a prediction. And when you reach inference for slopes in Unit 9, you're really asking how much you can trust the predictions your model makes. Unit 2 regression questions show up constantly on the exam, and "predict, then evaluate the prediction" is one of the most repeated tasks across released FRQs.

How Prediction connects across the course

Predicted value, ŷ (Unit 2)

The predicted value is the output of a prediction. You substitute x into ŷ = a + bx and the number that comes out is ŷ. On the exam, always write ŷ with the hat, because using plain y implies you observed the value rather than predicted it.

Residuals (Unit 2)

A residual is actual minus predicted, y − ŷ. Every residual question secretly starts with a prediction. A positive residual means the model under-predicted, and a negative residual means it over-predicted.

Regression Analysis (Unit 2)

Prediction is the payoff of regression analysis. The slope and intercept describe the relationship, but prediction is what you actually do with the equation once you have it.

Confidence Interval (Units 6-9)

A single predicted value ŷ is a point estimate with no sense of uncertainty attached. Interval thinking from the inference units is how statistics puts a margin of error around estimates, which is why a prediction by itself never tells the whole story.

Is Prediction on the AP Statistics exam?

Prediction shows up in two main ways. First, the straightforward calculation, where you're given a regression equation and asked for the predicted value at a specific x. The 2018 FRQ (checkout times) and 2022 FRQ (bullfrog length and mass) both hinge on using a fitted line this way. Second, the judgment call, where the question asks whether a prediction is appropriate. Multiple-choice stems love this setup. For example, a model built on fertilizer amounts from 5 to 15 grams gets used to predict growth at 30 grams, and the answer is extrapolation, no matter how high the r² is (even 0.92 doesn't save you). The 2023 tule elk FRQ leans on this kind of model-appropriateness reasoning too. Two habits earn points: always use ŷ notation and "predicted" language in your interpretations, and always check whether the x-value falls inside the range of the original data before trusting the prediction.

Prediction vs Extrapolation

Prediction and extrapolation aren't opposites; extrapolation is a specific kind of prediction. Prediction means plugging any x into ŷ = a + bx. Extrapolation means doing that with an x-value outside the interval of x-values used to build the line. Interpolation (predicting within the data range) is generally trustworthy. Extrapolation gets less reliable the farther you stray, because you have zero evidence the linear pattern continues out there. On the exam, 'this is extrapolation, so the prediction is unreliable' is a frequent correct answer.

Key things to remember about Prediction

  • To make a prediction, substitute the explanatory value x into the regression equation ŷ = a + bx and compute ŷ.

  • Always write predicted values as ŷ (y-hat), because plain y refers to an actual observed value, and the exam cares about the difference.

  • Extrapolation is predicting with an x-value outside the range of data used to build the model, and the prediction gets less reliable the farther you extrapolate.

  • A high r² does not make extrapolation safe; the model only describes the pattern within the observed x-values.

  • Residuals are computed as actual minus predicted (y − ŷ), so every residual problem requires making a prediction first.

  • When interpreting a prediction in context, use language like 'the predicted price' or 'we predict,' never deterministic phrasing like 'the price will be.'

Frequently asked questions about Prediction

What is prediction in AP Stats?

Prediction is using a linear regression model ŷ = a + bx to estimate the response variable for a given value of the explanatory variable x. It's the core skill of learning objective 2.6.A in Unit 2.

Is every prediction from a regression line reliable?

No. Predictions are only trustworthy when the x-value falls inside the range of data used to build the model. Predicting beyond that range is extrapolation, and the CED says the prediction becomes less reliable the farther you go.

What's the difference between prediction and extrapolation?

Extrapolation is a type of prediction, specifically one made with an x-value outside the interval of data used to fit the line. A model built on x-values from 5 to 15 grams predicting at 30 grams is extrapolating, even if r² is 0.92.

What's the difference between ŷ and y?

ŷ (y-hat) is the predicted value from the regression model, while y is the actual observed value. Their difference, y − ŷ, is the residual. Mixing them up in an FRQ interpretation can cost you points.

Does a high r² mean my predictions will be accurate?

Not necessarily. A high r² means the line fits the observed data well, but it says nothing about x-values outside that data range. Extrapolation is unreliable regardless of how strong r² is, and even within the range, individual predictions still have scatter around the line.