Simple linear regression is an equation, ŷ = a + bx, that uses one explanatory variable (x) to predict a response variable (y); a is the y-intercept, b is the slope, and ŷ is the predicted value of y for a given x (AP Stats Topic 2.6, DAT-1.D).
Simple linear regression is what you get when you turn a scatterplot's pattern into an actual equation you can use. Instead of just eyeballing that two variables seem related, you fit a line of the form ŷ = a + bx, where x is the explanatory variable, ŷ is the predicted response, a is the y-intercept, and b is the slope. "Simple" just means there's exactly one explanatory variable. Plug in an x-value, and the equation hands you a prediction for y.
Two details the CED cares about a lot. First, that hat on ŷ is not decoration. It signals a predicted value, not an actual observed data point, and AP graders look for it. Second, the model is only trustworthy inside the range of x-values used to build it. Predicting beyond that range is called extrapolation, and the further you stray from your data, the less reliable the prediction gets. A line built from data on 10-to-18-year-olds tells you nothing dependable about 40-year-olds.
This term lives in Unit 2 (Exploring Two-Variable Data), Topic 2.6, and directly supports learning objective 2.6.A: calculate a predicted response value using a linear regression model. The essential knowledge (DAT-1.D.1 through DAT-1.D.3) spells out the model, the ŷ = a + bx formula, and the extrapolation warning. But this is also the gateway concept for the back half of Unit 2. Interpreting slope and intercept, computing residuals, reading residual plots, and interpreting r² all assume you understand what the regression equation is doing in the first place. Get this one solid and the rest of Unit 2 clicks into place.
Keep studying AP Statistics Unit 2
Least Squares Method (Unit 2)
This is HOW the line gets chosen. Out of every possible line through the scatterplot, least squares picks the one that minimizes the sum of squared residuals. So when a problem says "least-squares regression line," it's talking about the standard simple linear regression line you've been using.
Scatterplot (Unit 2)
Always look at the scatterplot before trusting a regression line. The equation will happily fit a line through curved data, but that line is meaningless if the pattern isn't roughly linear. The scatterplot is your sanity check; the regression is the math that follows.
Residuals and r² (Unit 2)
A residual is actual minus predicted (y − ŷ), so residuals only exist because the regression model produces a ŷ to compare against. A residual plot with a clear curve or pattern is the exam's way of saying a linear model is the wrong choice, and r² tells you what percent of the variation in y the linear model explains.
Prediction and prediction intervals (Units 2 and 9)
In Unit 2, you make a single point prediction with ŷ = a + bx. Later in the course, inference takes over and asks how confident you can be in the slope itself and in predictions from it. Simple linear regression is the foundation that all of that inference is built on.
Multiple-choice questions hand you a regression equation and ask you to calculate ŷ for a given x, interpret the slope or intercept in context, or interpret r². For example, an r² of 0.64 means 64% of the variation in the response variable is explained by the linear relationship with the explanatory variable, and the exam loves testing whether you can say that precisely. Residual plot questions are also common, where a curved or fanning pattern in the residuals signals that a linear model isn't appropriate. On FRQs, regression shows up constantly as part of two-variable data analysis. You'll be expected to use computer output to write the equation, make a prediction, and always answer in context with the hat on ŷ. Dropping the hat or describing the slope without units and context costs points.
Correlation and regression travel together but answer different questions. Correlation (r) is a single number measuring the strength and direction of a linear relationship; it has no units and doesn't predict anything. Regression gives you an actual equation, ŷ = a + bx, that produces predictions. You can square r to get r², which tells you how well the regression line explains the variation in y, but r alone never tells you what the predicted value is.
Simple linear regression predicts a response variable y from one explanatory variable x using the equation ŷ = a + bx.
In the equation, a is the y-intercept (predicted y when x = 0) and b is the slope (predicted change in y for each one-unit increase in x).
The hat on ŷ matters because it marks a predicted value, not an observed one, and AP graders check for it.
Extrapolation means predicting with an x-value outside the range of the original data, and predictions get less reliable the further you extrapolate.
An r² of 0.64 means 64% of the variation in the response variable is explained by the linear relationship with the explanatory variable.
A residual plot with a clear pattern, like a curve, means a linear model is not appropriate, no matter how nice the line looks on the scatterplot.
It's a model from Topic 2.6 that uses one explanatory variable x to predict a response variable y with the equation ŷ = a + bx, where a is the y-intercept and b is the slope. "Simple" means there's only one explanatory variable.
No. Regression and correlation describe association, not causation. Even a line with r² near 1 can come from a lurking variable; only a well-designed randomized experiment lets you conclude cause and effect.
Correlation (r) is one number describing the strength and direction of a linear relationship, while regression is an equation that actually makes predictions. They're linked, since r² is the proportion of variation in y the regression line explains, but r by itself can't predict anything.
The hat means "predicted." ŷ is the value the regression line predicts for a given x, not an actual observed data point, and the difference between them (y − ŷ) is the residual. Writing y instead of ŷ in an FRQ interpretation can cost you points.
Extrapolation is predicting y using an x-value outside the interval of x-values that built the regression line (DAT-1.D.3). The model has no data out there, so the further you extrapolate, the less reliable the prediction becomes.