Fiveable

🎲Intro to Statistics Unit 12 Review

QR code for Intro to Statistics practice questions

12.9 Regression (Fuel Efficiency)

12.9 Regression (Fuel Efficiency)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Regression Analysis for Fuel Efficiency

Regression analysis helps you understand how one variable (like vehicle weight) affects another (like fuel efficiency). In this section, you'll learn to visualize that relationship with scatterplots, measure it with correlation coefficients, and use a regression equation to make predictions.

Scatterplots for Fuel Efficiency Relationships

A scatterplot is the first tool you should reach for when exploring the relationship between two quantitative variables. For fuel efficiency data, you'd plot vehicle weight (in lbs) on the x-axis and fuel efficiency (in mpg) on the y-axis. Each point represents a single vehicle.

Once the points are plotted, look at the overall pattern:

  • A roughly linear pattern suggests a linear relationship between the two variables.
  • A downward slope indicates a negative relationship: as weight increases, fuel efficiency decreases. This is what you'd expect here, since heavier cars generally need more fuel to move.
  • The tightness of the points around an imaginary line tells you about strength. Points clustered closely around a line suggest a strong relationship, while widely scattered points suggest a weaker one.
Scatterplots for fuel efficiency relationships, Linear Relationships (4 of 4) | Concepts in Statistics

Correlation Coefficients in Efficiency Data

The correlation coefficient (rr) puts a number on what the scatterplot shows you visually. It measures the strength and direction of a linear relationship between two quantitative variables.

  • rr ranges from 1-1 to 11.
  • Values close to 1-1 or 11 indicate a strong linear relationship; values close to 00 indicate a weak one.
  • A positive rr means both variables tend to increase together. A negative rr means one tends to decrease as the other increases.

For fuel efficiency vs. vehicle weight, you'd expect a negative rr. A typical dataset might give something like r=0.85r = -0.85, which would indicate a strong negative linear relationship.

The formula for rr is:

r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i - \bar{y})^2}}

where xix_i and yiy_i are individual data values, xˉ\bar{x} and yˉ\bar{y} are their respective means, and nn is the number of data points. In practice, you'll usually compute this with a calculator or software, but understanding the formula helps: the numerator captures how xx and yy move together, while the denominator standardizes everything so the result falls between 1-1 and 11.

Scatterplots for fuel efficiency relationships, Scatterplots (4 of 5) | Concepts in Statistics

Linear Regression for Efficiency Predictions

Linear regression finds the line of best fit through your scatterplot. This line minimizes the sum of the squared differences between the observed values and the predicted values, which is why the method is called least squares regression.

The regression equation takes the form:

y^=b0+b1x\hat{y} = b_0 + b_1 x

  • y^\hat{y} = predicted value of the dependent variable (fuel efficiency in mpg)
  • xx = value of the independent variable (vehicle weight in lbs)
  • b0b_0 = y-intercept (the predicted mpg when weight is zero, though this often isn't meaningful in context)
  • b1b_1 = slope (the predicted change in mpg for each additional pound of weight)

Calculating the slope and intercept:

b1=rsysxb_1 = r \frac{s_y}{s_x}

b0=yˉb1xˉb_0 = \bar{y} - b_1 \bar{x}

where rr is the correlation coefficient, sys_y and sxs_x are the standard deviations of the dependent and independent variables, and yˉ\bar{y} and xˉ\bar{x} are their means.

Making a prediction:

  1. Plug the vehicle's weight into the equation for xx.
  2. Solve for y^\hat{y} to get the predicted fuel efficiency.

For example, if your regression equation is y^=51.60.008x\hat{y} = 51.6 - 0.008x and you want to predict the mpg for a 3,000 lb car: y^=51.60.008(3000)=51.624=27.6\hat{y} = 51.6 - 0.008(3000) = 51.6 - 24 = 27.6 mpg.

Residuals are the differences between observed and predicted values: residual=yy^\text{residual} = y - \hat{y}. If a car actually gets 30 mpg but the model predicted 27.6, the residual is +2.4+2.4 mpg. Residuals help you assess how well the model fits the data. A good model will have residuals that are small and show no obvious pattern.

Advanced Regression Techniques

Two concepts worth knowing at this level:

  • Multiple regression extends simple linear regression by including more than one independent variable. For fuel efficiency, you might include both weight and engine size as predictors, which can improve the model's accuracy.
  • Extrapolation means using the regression equation to predict values outside the range of your original data. This is risky. If your data only includes cars weighing 2,000 to 5,000 lbs, predicting mpg for a 500 lb vehicle could give a nonsensical result. Stick to predictions within (or very close to) the range of your data.