Y-intercept

In AP Statistics, the y-intercept of a least-squares regression line is the predicted value of the response variable when the explanatory variable equals 0, calculated as a = ȳ - bx̄. It doesn't always have a logical interpretation in context, and the exam loves testing exactly that.

Verified for the 2027 AP Statistics examLast updated June 2026

What is the y-intercept?

The y-intercept is one of the two coefficients in the least-squares regression equation ŷ = a + bx (the other is the slope, b). It's the predicted value of the response variable when the explanatory variable is 0. You can calculate it with the formula a = ȳ - bx̄, which works because every least-squares regression line passes through the point (x̄, ȳ).

Here's the part the AP exam cares about most. The CED says it directly: sometimes the y-intercept does not have a logical interpretation in context. If your explanatory variable is fertilizer concentration and your data ran from 10 to 100 ppm, the y-intercept predicts plant growth at 0 ppm, which is outside the data. That's extrapolation, and the prediction may be nonsense. Sometimes x = 0 is flat-out impossible (a person with 0 cm height, a house with 0 square feet). Knowing when the intercept means something and when it's just a mathematical anchor for the line is the real skill.

Why the y-intercept matters in AP Statistics

The y-intercept lives in Unit 2 (Exploring Two-Variable Data), specifically Topics 2.8 and 2.9. It supports learning objective 2.8.A (estimating the parameters of the LSRL, where a = ȳ - bx̄) and 2.8.B (interpreting the coefficients, where the y-intercept is the predicted response when x = 0). It also shows up in 2.9.A, because an influential point can substantially change the y-intercept when removed. Unit 2 is heavily tested, and coefficient interpretation is one of the most reliable point-grabbing skills on the exam because the scoring follows a predictable template.

How the y-intercept connects across the course

Slope (Unit 2)

Slope and y-intercept are the two coefficients of the LSRL, but they answer different questions. The slope tells you how much the predicted y changes per unit increase in x; the y-intercept tells you where the line starts when x = 0. They're also linked by formula, since a = ȳ - bx̄ means you can't find the intercept without the slope.

Influential Point (Unit 2)

An influential point is one that, if removed, substantially changes the regression. A changed y-intercept is one of the CED's named ways to spot influence. A high-leverage point far from x̄ can swing the line like a seesaw, dragging the intercept with it.

Power Regression Models (Unit 2)

When you transform data (like taking the natural log of y) to straighten a curved pattern, the new regression line still has a y-intercept, but it's the intercept of the transformed relationship. Interpreting it in original units means undoing the transformation, so be careful what your 'a' actually predicts.

Residuals (Unit 2)

The y-intercept is set by the least-squares criterion, which minimizes the sum of squared residuals. The intercept isn't chosen to be meaningful; it's chosen to make the line fit best. That's exactly why it sometimes has no sensible real-world interpretation.

Is the y-intercept on the AP Statistics exam?

Two main jobs. First, calculation: given summary statistics (r, sx, sy, x̄, ȳ, or raw sums), find b = r(sy/sx), then a = ȳ - bx̄. Multiple-choice questions hand you means and standard deviations and expect you to chain both formulas. Second, interpretation: given a regression equation like ŷ = 45 - 1.8x for temperature and energy use, pick or write the correct interpretation of the intercept ('the predicted energy consumption when temperature is 0°C is 45 kWh'). The trap answers either drop the word 'predicted' or treat an extrapolated intercept as guaranteed truth. On FRQs, regression questions like 2018 Q1 routinely ask you to interpret coefficients in context; full credit requires (1) the word 'predicted' or 'estimated,' (2) the value, and (3) context (variable names and units). If x = 0 falls outside the data range, the best answer often flags that the intercept lacks a meaningful interpretation.

The y-intercept vs slope

Both are coefficients in ŷ = a + bx, and interpretation questions deliberately swap their wordings. The slope interpretation involves change ('for each 1-unit increase in x, predicted y changes by b'). The y-intercept interpretation involves a single point ('when x = 0, the predicted y is a'). If your sentence about the intercept contains the phrase 'for each additional,' you've accidentally written a slope interpretation.

Key things to remember about the y-intercept

  • The y-intercept is the predicted value of the response variable when the explanatory variable equals 0.

  • Calculate it with a = ȳ - bx̄, which works because the least-squares line always passes through (x̄, ȳ).

  • Always say 'predicted' or 'estimated' when interpreting the y-intercept; the line gives predictions, not guaranteed values.

  • Sometimes the y-intercept has no logical interpretation, especially when x = 0 is impossible or far outside the range of the data.

  • Removing a point that substantially changes the y-intercept, slope, or correlation means that point was influential.

  • After transforming data to straighten a curved pattern, the y-intercept belongs to the transformed equation, not the original units.

Frequently asked questions about the y-intercept

What is the y-intercept in AP Stats?

It's the coefficient a in the regression equation ŷ = a + bx, representing the predicted value of the response variable when the explanatory variable is 0. You can compute it as a = ȳ - bx̄.

Does the y-intercept always have a meaningful interpretation?

No, and the CED says so explicitly. If x = 0 is impossible (like a person weighing 0 pounds) or falls outside the range of the data, the y-intercept is just the mathematical anchor of the line, not a sensible prediction.

How is the y-intercept different from the slope?

The slope describes change (how much predicted y shifts per 1-unit increase in x), while the y-intercept is a single predicted value (predicted y when x = 0). On interpretation questions, slope answers say 'for each additional unit' and intercept answers say 'when x is 0.'

How do you interpret the y-intercept on an AP Stats FRQ?

Use the template: 'When [explanatory variable] is 0 [units], the predicted [response variable] is [a] [units].' For ŷ = 45 - 1.8x relating temperature to energy use, that's 'when the temperature is 0°C, the predicted energy consumption is 45 kWh.' Leaving out 'predicted' or the context costs you the point.

How do you find the y-intercept from summary statistics?

First find the slope with b = r(sy/sx), then plug into a = ȳ - bx̄. For example, with r = 0.8, sx = 12, sy = 3, x̄ = 45, and ȳ = 20, the slope is 0.8(3/12) = 0.2 and the intercept is 20 - 0.2(45) = 11.