---
title: "Simple Linear Regression — AP Stats Definition & Guide"
description: "Simple linear regression uses ŷ = a + bx to predict a response variable from one explanatory variable. Learn slope, intercept, extrapolation, and how AP Stats tests it."
canonical: "https://fiveable.me/ap-stats/key-terms/simple-linear-regression"
type: "key-term"
subject: "AP Statistics"
unit: "Unit 5"
---

# Simple Linear Regression — AP Stats Definition & Guide

## Definition

Simple linear regression is an equation, ŷ = a + bx, that uses one explanatory variable (x) to predict a response variable (y); a is the y-intercept, b is the slope, and ŷ is the predicted value of y for a given x (AP Stats Topic 2.6, DAT-1.D).

## What It Is

Simple linear regression is what you get when you turn a [scatterplot](/ap-stats/key-terms/scatterplot "fv-autolink")'s pattern into an actual equation you can use. Instead of just eyeballing that two [variables](/ap-stats/unit-1/language-variation-variables/study-guide/nKpeaxi1H3Ht9aFhTHKt "fv-autolink") seem related, you fit a line of the form **ŷ = a + bx**, where x is the explanatory variable, ŷ is the *predicted* response, a is the y-intercept, and b is the slope. "Simple" just means there's exactly one explanatory variable. Plug in an x-value, and the equation hands you a prediction for y.

Two details the CED cares about a lot. First, that hat on ŷ is not decoration. It signals a [predicted value](/ap-stats/key-terms/predicted-value "fv-autolink"), not an actual observed data point, and AP graders look for it. Second, the model is only trustworthy inside the range of x-values used to build it. Predicting beyond that range is called **extrapolation**, and the further you stray from your data, the less reliable the prediction gets. A line built from data on 10-to-18-year-olds tells you nothing dependable about 40-year-olds.

## Why It Matters

This term lives in [Unit 2](/ap-stats/unit-2 "fv-autolink") (Exploring Two-Variable Data), Topic 2.6, and directly supports learning objective 2.6.A: calculate a predicted response value using a linear regression model. The essential knowledge (DAT-1.D.1 through DAT-1.D.3) spells out the model, the ŷ = a + bx formula, and the extrapolation warning. But this is also the gateway concept for the back half of Unit 2. Interpreting slope and [intercept](/ap-stats/key-terms/intercept "fv-autolink"), computing residuals, reading residual plots, and interpreting r² all assume you understand what the regression equation is doing in the first place. Get this one solid and the rest of Unit 2 clicks into place.

## Connections

### Least Squares Method (Unit 2)

This is HOW the line gets chosen. Out of every possible line through the scatterplot, least squares picks the one that minimizes the sum of squared residuals. So when a problem says "[least-squares regression line](/ap-stats/key-terms/least-squares-regression-line "fv-autolink")," it's talking about the standard simple linear regression line you've been using.

### [Scatterplot (Unit 2)](/ap-stats/key-terms/scatterplot)

Always look at the scatterplot before trusting a [regression line](/ap-stats/key-terms/regression-line "fv-autolink"). The equation will happily fit a line through curved data, but that line is meaningless if the pattern isn't roughly linear. The scatterplot is your sanity check; the regression is the math that follows.

### Residuals and r² (Unit 2)

A residual is actual minus predicted (y − ŷ), so residuals only exist because the regression model produces a ŷ to compare against. A residual plot with a clear curve or pattern is the exam's way of saying a linear model is the wrong choice, and r² tells you what percent of the [variation](/ap-stats/unit-5 "fv-autolink") in y the linear model explains.

### Prediction and prediction intervals (Units 2 and 9)

In Unit 2, you make a single point prediction with ŷ = a + bx. Later in the course, inference takes over and asks how confident you can be in the slope itself and in predictions from it. Simple linear regression is the foundation that all of that inference is built on.

## On the AP Exam

Multiple-choice questions hand you a regression equation and ask you to calculate ŷ for a given x, interpret the slope or intercept in context, or interpret r². For example, an r² of 0.64 means 64% of the variation in the response variable is explained by the linear relationship with the explanatory variable, and the exam loves testing whether you can say that precisely. Residual plot questions are also common, where a curved or fanning pattern in the residuals signals that a linear model isn't appropriate. On FRQs, regression shows up constantly as part of two-variable data analysis. You'll be expected to use computer output to write the equation, make a prediction, and always answer in context with the hat on ŷ. Dropping the hat or describing the slope without units and context costs points.

## Simple linear regression vs Correlation (r)

Correlation and regression travel together but answer different questions. Correlation (r) is a single number measuring the strength and direction of a linear relationship; it has no units and doesn't predict anything. Regression gives you an actual equation, ŷ = a + bx, that produces predictions. You can square r to get r², which tells you how well the regression line explains the variation in y, but r alone never tells you what the predicted value is.

## Key Takeaways

- Simple linear regression predicts a response variable y from one explanatory variable x using the equation ŷ = a + bx.
- In the equation, a is the y-intercept (predicted y when x = 0) and b is the slope (predicted change in y for each one-unit increase in x).
- The hat on ŷ matters because it marks a predicted value, not an observed one, and AP graders check for it.
- Extrapolation means predicting with an x-value outside the range of the original data, and predictions get less reliable the further you extrapolate.
- An r² of 0.64 means 64% of the variation in the response variable is explained by the linear relationship with the explanatory variable.
- A residual plot with a clear pattern, like a curve, means a linear model is not appropriate, no matter how nice the line looks on the scatterplot.

## FAQs

### What is simple linear regression in AP Stats?

It's a model from Topic 2.6 that uses one explanatory variable x to predict a response variable y with the equation ŷ = a + bx, where a is the y-intercept and b is the slope. "Simple" means there's only one explanatory variable.

### Does a strong regression line prove that x causes y?

No. Regression and correlation describe association, not causation. Even a line with r² near 1 can come from a lurking variable; only a well-designed randomized experiment lets you conclude cause and effect.

### What's the difference between simple linear regression and correlation?

Correlation (r) is one number describing the strength and direction of a linear relationship, while regression is an equation that actually makes predictions. They're linked, since r² is the proportion of variation in y the regression line explains, but r by itself can't predict anything.

### Why does ŷ have a hat on it?

The hat means "predicted." ŷ is the value the regression line predicts for a given x, not an actual observed data point, and the difference between them (y − ŷ) is the residual. Writing y instead of ŷ in an FRQ interpretation can cost you points.

### What is extrapolation and why is it bad?

Extrapolation is predicting y using an x-value outside the interval of x-values that built the regression line (DAT-1.D.3). The model has no data out there, so the further you extrapolate, the less reliable the prediction becomes.

## Related Study Guides

- [5.3 Linear Regression Models](/ap-stats/unit-5/linear-regression-models/study-guide/PSt5cfDuvB5nu60DHulR)

## Structured Data

```json
{"@context":"https://schema.org","@graph":[{"@type":"LearningResource","@id":"https://fiveable.me/ap-stats/key-terms/simple-linear-regression#resource","name":"Simple Linear Regression — AP Stats Definition & Guide","url":"https://fiveable.me/ap-stats/key-terms/simple-linear-regression","learningResourceType":"Concept explainer","educationalLevel":"AP® / High School","about":{"@id":"https://fiveable.me/ap-stats/key-terms/simple-linear-regression#term"},"audience":{"@type":"EducationalAudience","educationalRole":"student"},"dateModified":"2026-06-12T23:22:03.834Z","isPartOf":{"@type":"Collection","name":"AP Statistics Key Terms","url":"https://fiveable.me/ap-stats/key-terms"},"publisher":{"@type":"Organization","name":"Fiveable","url":"https://fiveable.me"}},{"@type":"DefinedTerm","@id":"https://fiveable.me/ap-stats/key-terms/simple-linear-regression#term","name":"Simple linear regression","description":"Simple linear regression is an equation, ŷ = a + bx, that uses one explanatory variable (x) to predict a response variable (y); a is the y-intercept, b is the slope, and ŷ is the predicted value of y for a given x (AP Stats Topic 2.6, DAT-1.D).","url":"https://fiveable.me/ap-stats/key-terms/simple-linear-regression","inDefinedTermSet":{"@type":"DefinedTermSet","name":"AP Statistics Key Terms","url":"https://fiveable.me/ap-stats/key-terms"}},{"@type":"FAQPage","mainEntity":[{"@type":"Question","name":"What is simple linear regression in AP Stats?","acceptedAnswer":{"@type":"Answer","text":"It's a model from Topic 2.6 that uses one explanatory variable x to predict a response variable y with the equation ŷ = a + bx, where a is the y-intercept and b is the slope. \"Simple\" means there's only one explanatory variable."}},{"@type":"Question","name":"Does a strong regression line prove that x causes y?","acceptedAnswer":{"@type":"Answer","text":"No. Regression and correlation describe association, not causation. Even a line with r² near 1 can come from a lurking variable; only a well-designed randomized experiment lets you conclude cause and effect."}},{"@type":"Question","name":"What's the difference between simple linear regression and correlation?","acceptedAnswer":{"@type":"Answer","text":"Correlation (r) is one number describing the strength and direction of a linear relationship, while regression is an equation that actually makes predictions. They're linked, since r² is the proportion of variation in y the regression line explains, but r by itself can't predict anything."}},{"@type":"Question","name":"Why does ŷ have a hat on it?","acceptedAnswer":{"@type":"Answer","text":"The hat means \"predicted.\" ŷ is the value the regression line predicts for a given x, not an actual observed data point, and the difference between them (y − ŷ) is the residual. Writing y instead of ŷ in an FRQ interpretation can cost you points."}},{"@type":"Question","name":"What is extrapolation and why is it bad?","acceptedAnswer":{"@type":"Answer","text":"Extrapolation is predicting y using an x-value outside the interval of x-values that built the regression line (DAT-1.D.3). The model has no data out there, so the further you extrapolate, the less reliable the prediction becomes."}}]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"AP Statistics","item":"https://fiveable.me/ap-stats"},{"@type":"ListItem","position":2,"name":"Key Terms","item":"https://fiveable.me/ap-stats/key-terms"},{"@type":"ListItem","position":3,"name":"Unit 5","item":"https://fiveable.me/ap-stats/unit-5"},{"@type":"ListItem","position":4,"name":"Simple linear regression"}]}]}
```
