In AP Statistics, a predicted value (written ŷ, "y-hat") is the response value a linear regression model estimates for a given explanatory value x, calculated as ŷ = a + bx, where a is the y-intercept and b is the slope of the least-squares regression line.
A predicted value is what the regression line says y should be for a particular x. The model uses the explanatory variable x to predict the response variable y, and the prediction is written ŷ (pronounced "y-hat") to signal that it came from the model, not from real data. The calculation is just plug-and-chug: take the regression equation ŷ = a + bx and substitute your x-value.
The hat matters more than it looks. A plain y is an actual observed data point. A ŷ is the line's best guess. The gap between them (y - ŷ) is the residual, which tells you how far off the model was for that point. One more catch from the CED: predictions are only trustworthy inside the range of x-values used to build the line. Predicting outside that range is called extrapolation, and the further you stray, the less reliable ŷ becomes. A line built on wolves 1 to 1.5 meters long has no business predicting the weight of a 4-meter wolf.
Predicted values live in Topic 2.6 (Linear Regression Models) in Unit 2: Exploring Two-Variable Data, under learning objective 2.6.A, which asks you to calculate a predicted response value using a linear regression model. The supporting essential knowledge (DAT-1.D.1 through DAT-1.D.3) covers the model itself, the ŷ = a + bx formula, and the danger of extrapolation. This is one of the most reused skills in the whole course. You can't find a residual, interpret a residual plot, or judge whether a linear model fits without first knowing what the model predicted. Bivariate data questions show up every year, and computing or interpreting ŷ is usually step one.
Keep studying AP Statistics Unit 2
Residuals (Unit 2)
A residual is actual minus predicted, y - ŷ. So every residual question secretly starts with a predicted-value calculation. If you can't find ŷ, you can't find the residual, and you can't read a residual plot to check whether a linear model is appropriate.
Dependent variable (Unit 2)
The predicted value is an estimate of the dependent (response) variable. ŷ and y measure the same quantity in the same units. One comes from the line, the other from the data.
Independent variable (Unit 2)
The independent (explanatory) variable x is the input you plug into ŷ = a + bx. Direction matters here. A line built to predict weight from length cannot be flipped around to predict length from weight without refitting the regression.
Extrapolation (Unit 2)
Extrapolation is using the equation to predict ŷ for an x outside the range of the original data. The math still spits out a number, but the CED is blunt about it. The further you extrapolate, the less reliable the predicted value is, and exam answers should say so.
Predicted values get tested two ways. Multiple-choice stems hand you an equation like ŷ = 8.2 + 1.7x and ask you to compute a prediction, interpret the slope or intercept correctly, or find a residual given an actual value (for example, a plant given 5 grams of fertilizer that actually grew 18.3 cm). The trap answers usually confuse ŷ with y or misread the slope's direction. On FRQs, this skill anchors the classic bivariate-data question. The 2017 wolf question (predicting weight from length) and the 2022 bullfrog question (mass from length) both required using a regression model to generate predictions and work with residuals. Two scoring habits matter: always write ŷ, not y, when you state the equation or a prediction, and use the word "predicted" in slope and intercept interpretations ("for each additional meter of length, the predicted weight increases by..."). Leaving out "predicted" can cost you credit.
The actual value y is the real data point you measured. The predicted value ŷ is what the regression line estimates for that same x. They almost never match exactly, and the difference y - ŷ is the residual. Mixing them up flips the sign of every residual you calculate, so if a problem says "the actual growth was 18.3 cm," that's y, and you still need to plug x into the equation to get ŷ before subtracting.
A predicted value ŷ is found by plugging an x-value into the regression equation ŷ = a + bx.
The hat on ŷ distinguishes the model's estimate from y, the actual observed value in the data.
Residual = actual minus predicted (y - ŷ), so computing ŷ is the first step of every residual problem.
Predicting with an x-value outside the range of the original data is extrapolation, and those predictions become less reliable the further out you go.
On FRQs, interpretations of slope and intercept must use the word "predicted" (e.g., "the predicted weight increases by b kilograms per meter") to earn full credit.
It's the response value a regression model estimates for a given explanatory value, written ŷ and calculated as ŷ = a + bx. For example, with ŷ = 8.2 + 1.7x and x = 5, the predicted value is 8.2 + 1.7(5) = 16.7.
No. The actual value y is the measured data point, while ŷ is the line's estimate for that x. The difference between them is the residual, and a nonzero residual is the normal situation, not an error.
The predicted value ŷ is the model's output for a given x; the residual is y - ŷ, how far the actual data point falls from that prediction. In the fertilizer example, ŷ = 16.7 cm and the actual growth is 18.3 cm, so the residual is 18.3 - 16.7 = 1.6 cm.
The hat marks it as an estimate from the model rather than an observed value. On free-response questions, writing y when you mean ŷ in a regression equation or prediction can cost you points, so the notation isn't optional.
You can always compute a number, but predictions for x-values outside the range of the original data are extrapolation, and the CED (DAT-1.D.3) says they get less reliable the further you go. A strong AP answer flags extrapolation whenever the x-value is out of range.