Fiveable

🎲Intro to Statistics Unit 12 Review

QR code for Intro to Statistics practice questions

12.3 The Regression Equation

12.3 The Regression Equation

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

The Regression Equation

Linear regression gives you a formula for drawing the best possible straight line through a scatterplot. That line lets you predict the value of one variable based on another. This section covers how the regression equation is built, what its parts mean, and how to measure how well it fits your data.

Least-Squares Regression Line Calculation

The goal of a least-squares regression line is to find the line that fits your data as closely as possible. "Best fit" has a specific meaning here: the line minimizes the sum of the squared vertical distances between each data point and the line itself. Squaring those distances prevents positive and negative errors from canceling each other out, and it penalizes large misses more heavily than small ones.

The equation of the least-squares regression line is:

y^=b0+b1x\hat{y} = b_0 + b_1 x

  • y^\hat{y}: the predicted value of the response (dependent) variable
  • b0b_0: the y-intercept of the regression line
  • b1b_1: the slope of the regression line
  • xx: the value of the explanatory (independent) variable

The hat symbol on y^\hat{y} is important. It tells you this is a predicted value from the equation, not an actual observed value.

How to calculate the slope and intercept:

  1. Find the mean of your explanatory variable (xˉ\bar{x}) and the mean of your response variable (yˉ\bar{y}).
  2. Calculate the slope using: b_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2

The numerator measures how xx and yy vary together. The denominator measures how much xx varies on its own. 3. Calculate the y-intercept using: b0=yˉb1xˉb_0 = \bar{y} - b_1 \bar{x}

This formula guarantees that the regression line passes through the point (xˉ,yˉ)(\bar{x}, \bar{y}).

Least-squares regression line calculation, Linear Regression (2 of 4) | Statistics for the Social Sciences

Interpretation of Regression Slope

The slope (b1b_1) tells you how much the predicted response variable (yy) changes for each one-unit increase in the explanatory variable (xx).

Context matters for interpretation. For example, if you're modeling salary based on years of experience and b1=1500b_1 = 1500, you'd say: "For each additional year of experience, salary increases by $1,500 on average."

The y-intercept (b0b_0) is the predicted value of yy when x=0x = 0. Sometimes this makes practical sense (e.g., a starting salary with zero experience). Other times it doesn't (e.g., predicting weight when height is zero). Always check whether x=0x = 0 falls within a reasonable range of your data before interpreting the intercept literally.

Least-squares regression line calculation, Least squares - Wikipedia

Correlation and Determination Coefficients

Correlation coefficient (rr) measures the strength and direction of the linear relationship between two variables.

  • rr ranges from 1-1 to 11
  • Values close to 1-1 or 11 indicate a strong linear relationship
  • Values close to 00 indicate a weak linear relationship
  • Positive rr: as xx increases, yy tends to increase
  • Negative rr: as xx increases, yy tends to decrease

A common mistake is thinking that rr close to zero means "no relationship." It means no linear relationship. The variables could still be related in a curved pattern.

Coefficient of determination (r2r^2) tells you the proportion of variation in the response variable that is explained by the explanatory variable.

  • r2r^2 ranges from 00 to 11
  • If r2=0.75r^2 = 0.75, you'd say: "75% of the variation in the response variable is explained by the explanatory variable." The remaining 25% is due to other factors or random variation.

Since r2r^2 is just rr squared, you can always get one from the other. But note that r2r^2 doesn't tell you the direction of the relationship, only how much variation is accounted for.

Linear Regression Analysis

Before fitting a regression line, always start with a scatterplot. Plotting your data lets you check whether a linear model is even appropriate. If the points follow a curve or show no clear pattern, a straight line won't capture the relationship well.

Once you confirm the relationship looks roughly linear, you perform regression analysis to find the equation of the best-fit line. From there, you can use the equation to predict values of yy for given values of xx. Just be careful not to extrapolate far beyond the range of your original data. The linear pattern you observed may not hold outside that range.