A simple linear regression model uses an explanatory variable x to predict a response variable y with the equation ŷ = a + bx, where a is the y intercept and b is the slope. You plug a given x value into the equation to get the predicted response ŷ, and you stay inside the range of your data to keep predictions trustworthy. Topic 2.6, Linear Regression Models is part of AP Statistics in Unit 2 - Exploring Two-Variable Data.
AP Stats Linear Regression
In AP Stats, a linear regression model predicts a response variable y from an explanatory variable x using ŷ = a + bx. The predicted response ŷ is the value on the regression line, not the actual observed y value, so you should always describe it as a prediction in context.
The main calculation skill is substituting an x-value into the model to find ŷ. The main interpretation skill is deciding whether that prediction is reliable, especially whether the x-value is inside the observed range of data or is an extrapolation.

Why This Matters for the AP Statistics Exam
This topic is the core of how AP Statistics turns a relationship into a usable prediction. On the exam you will be asked to calculate a predicted response by substituting an x value into a regression equation, and you will need to explain your result in context with correct units. You will also need to recognize when a prediction relies on extrapolation and explain why that lowers reliability.
Regression equations show up in both multiple-choice and free-response settings, often built from technology output. Being able to read the slope and intercept, predict ŷ, and comment on whether a prediction is reasonable sets you up for the deeper regression work later in Unit 2 and the regression inference in Unit 9.
Key Takeaways
- A simple linear regression model predicts the response variable y from one explanatory variable x using ŷ = a + bx.
- ŷ is always the predicted value, not the actual observed value. The x you plug in comes from the data, but y is estimated.
- To predict a response, substitute the given x into the equation and compute ŷ.
- Interpolation means predicting inside the observed range of x. It is generally more reliable.
- Extrapolation means predicting outside the observed range of x. The farther out you go, the less trustworthy the prediction.
- Always interpret your predicted value in context and check whether it makes sense for the situation.
What is a Linear Regression Model?
A simple linear regression model is an equation that uses an explanatory variable, x, to predict a response variable, y. The predicted value is written as ŷ and calculated with:
ŷ = a + bx
where a is the y-intercept (the predicted value of ŷ when x = 0) and b is the slope (how much ŷ changes for each one-unit increase in x).
Once you have the slope and intercept, you can substitute any x value into the equation to find the predicted response.
Keep the notation straight: ŷ is the predicted response, so it is never the exact observed y value. The x value comes directly from the data, but the y value you report from the line is always a prediction, not a measured result.
Extrapolation
When you have a regression equation, you can use it to predict the response variable for a given value of the explanatory variable. Predicting inside the range of x-values in your data set is called interpolation, and it is generally more accurate because the prediction is supported by actual data.
Extrapolation is predicting a response value using an x-value that is beyond the interval of x-values used to build the regression line. Extrapolation is generally less reliable because it assumes the linear pattern keeps holding outside the observed data, which may not be true. The farther outside the range of the data you go, the less reliable the predictions are likely to be.
Be cautious with extrapolation. Predictions far outside your data can produce unrealistic or misleading results, so stay aware of your model's limits.
Example
In a model built using data for 19 to 24 year olds, a regression line predicts an individual's comfort level with technology (on a scale of 1 to 10) from their age: ŷ = 0.32x + 0.67, where ŷ is the predicted comfort level and x represents age.
Predict the comfort level of a 45 year old and explain why this prediction does not make sense.
ŷ = 0.32(45) + 0.67
ŷ = 15.07
This prediction does not make sense because comfort levels were measured on a 1 to 10 scale, and 15.07 falls outside that range. The problem is that the model was built using data for 19 to 24 year olds, so predicting for a 45 year old is extrapolation. The model was never designed to describe ages that far outside the original data.
How to Use This on the AP Statistics Exam
Problem Solving
To predict a response value, substitute the given x into ŷ = a + bx and compute. Show the substitution, not just the final number, so your work is clear.
A study examined the relationship between hours of study per week and final exam scores for a sample of 25 students. The least squares regression line was ŷ = 42.3 - 0.5x, where ŷ is the predicted final exam score and x is the number of hours of study per week.
Predict the final exam score for a student who studies 15 hours per week.
ŷ = 42.3 - 0.5(15)
ŷ = 42.3 - 7.5
ŷ = 34.8
Interpretation in context: The predicted final exam score for a student who studies 15 hours per week is 34.8 points.
Common Trap
Before reporting any predicted value, check whether the x you used falls inside the range of the data. If it does not, label your work as extrapolation and note that the prediction is less reliable. Also confirm your answer is realistic for the context, like a comfort score staying within its 1 to 10 scale.
Common Misconceptions
- ŷ is the actual value. ŷ is the predicted response from the line, not a measured data point. The real value usually differs from the prediction.
- A regression line works for any x. Predictions are most trustworthy inside the observed range of x. Outside that range, you are extrapolating, and reliability drops the farther you go.
- A prediction is valid just because the math works. You can always compute ŷ, but the result can be meaningless if it falls outside a realistic range or relies on extrapolation. Always check context.
- Slope and intercept are just numbers. The slope tells you how much ŷ changes per one-unit increase in x, and the intercept is the predicted ŷ when x = 0. Read them in context, and remember the intercept does not always make real-world sense.
- "Correlation" and "regression" mean the same thing. Correlation measures the strength and direction of a linear relationship, while a regression model gives you an equation to predict the response from the explanatory variable.
Related AP Statistics Guides
Vocabulary
The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.Term | Definition |
|---|---|
explanatory variable | A variable whose values are used to explain or predict corresponding values for the response variable. |
extrapolation | Predicting a response value using a value for the explanatory variable that is beyond the range of x-values used to create the regression model, resulting in less reliable predictions. |
least-squares regression line | A linear model that minimizes the sum of squared residuals to find the best-fitting line through a set of data points. |
linear regression model | An equation that uses an explanatory variable to predict a response variable in a linear relationship. |
predicted value | The estimated response value obtained from a regression model, denoted as ŷ. |
response variable | A variable whose values are being explained or predicted based on the explanatory variable. |
slope | The value b in the regression equation ŷ = a + bx, representing the rate of change in the predicted response for each unit increase in the explanatory variable. |
y-intercept | The value a in the regression equation ŷ = a + bx, representing the predicted response value when the explanatory variable equals zero. |
Frequently Asked Questions
What is linear regression in AP Stats?
Linear regression uses an explanatory variable x to predict a response variable y with a model such as ŷ = a + bx. The result ŷ is a predicted response.
How do you calculate a predicted response value?
Substitute the given x-value into the regression equation ŷ = a + bx, then compute ŷ and interpret the predicted response in context.
What does ŷ mean in AP Statistics?
ŷ means the predicted value of the response variable from the regression line. It is not the actual observed y-value.
How do you interpret slope in a regression model?
The slope b tells you the predicted change in ŷ for each one-unit increase in x, using the units and context of the variables.
What is extrapolation in AP Stats?
Extrapolation is using a regression model to predict for an x-value outside the range of x-values used to create the model. The farther out you go, the less reliable the prediction is.
Why can a regression prediction be unreliable?
A prediction can be unreliable if it extrapolates beyond the data, ignores context, or gives a value that does not make sense for the response variable.