Regression analysis helps you understand how one variable (like vehicle weight) affects another (like fuel efficiency). In this section, you'll learn to visualize that relationship with scatterplots, measure it with correlation coefficients, and use a regression equation to make predictions.

Scatterplots for Fuel Efficiency Relationships

A scatterplot is the first tool you should reach for when exploring the relationship between two quantitative variables. For fuel efficiency data, you'd plot vehicle weight (in lbs) on the x-axis and fuel efficiency (in mpg) on the y-axis. Each point represents a single vehicle.

Once the points are plotted, look at the overall pattern:

A roughly linear pattern suggests a linear relationship between the two variables.
A downward slope indicates a negative relationship: as weight increases, fuel efficiency decreases. This is what you'd expect here, since heavier cars generally need more fuel to move.
The tightness of the points around an imaginary line tells you about strength. Points clustered closely around a line suggest a strong relationship, while widely scattered points suggest a weaker one.

Scatterplots for fuel efficiency relationships, Linear Relationships (4 of 4) | Concepts in Statistics

Correlation Coefficients in Efficiency Data

The correlation coefficient ( $r$ ) puts a number on what the scatterplot shows you visually. It measures the strength and direction of a linear relationship between two quantitative variables.

$r$ ranges from $-1$ to $1$ .
Values close to $-1$ or $1$ indicate a strong linear relationship; values close to $0$ indicate a weak one.
A positive $r$ means both variables tend to increase together. A negative $r$ means one tends to decrease as the other increases.

For fuel efficiency vs. vehicle weight, you'd expect a negative $r$ . A typical dataset might give something like $r = -0.85$ , which would indicate a strong negative linear relationship.

The formula for $r$ is:

$r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i - \bar{y})^2}}$

where $x_i$ and $y_i$ are individual data values, $\bar{x}$ and $\bar{y}$ are their respective means, and $n$ is the number of data points. In practice, you'll usually compute this with a calculator or software, but understanding the formula helps: the numerator captures how $x$ and $y$ move together, while the denominator standardizes everything so the result falls between $-1$ and $1$ .

Scatterplots for fuel efficiency relationships, Scatterplots (4 of 5) | Concepts in Statistics

Linear Regression for Efficiency Predictions

Linear regression finds the line of best fit through your scatterplot. This line minimizes the sum of the squared differences between the observed values and the predicted values, which is why the method is called least squares regression.

The regression equation takes the form:

$\hat{y} = b_0 + b_1 x$

$\hat{y}$ = predicted value of the dependent variable (fuel efficiency in mpg)
$x$ = value of the independent variable (vehicle weight in lbs)
$b_0$ = y-intercept (the predicted mpg when weight is zero, though this often isn't meaningful in context)
$b_1$ = slope (the predicted change in mpg for each additional pound of weight)

Calculating the slope and intercept:

$b_1 = r \frac{s_y}{s_x}$

$b_0 = \bar{y} - b_1 \bar{x}$

where $r$ is the correlation coefficient, $s_y$ and $s_x$ are the standard deviations of the dependent and independent variables, and $\bar{y}$ and $\bar{x}$ are their means.

Making a prediction:

Plug the vehicle's weight into the equation for $x$ .
Solve for $\hat{y}$ to get the predicted fuel efficiency.

For example, if your regression equation is $\hat{y} = 51.6 - 0.008x$ and you want to predict the mpg for a 3,000 lb car: $\hat{y} = 51.6 - 0.008(3000) = 51.6 - 24 = 27.6$ mpg.

Residuals are the differences between observed and predicted values: $\text{residual} = y - \hat{y}$ . If a car actually gets 30 mpg but the model predicted 27.6, the residual is $+2.4$ mpg. Residuals help you assess how well the model fits the data. A good model will have residuals that are small and show no obvious pattern.

Advanced Regression Techniques

Two concepts worth knowing at this level:

Multiple regression extends simple linear regression by including more than one independent variable. For fuel efficiency, you might include both weight and engine size as predictors, which can improve the model's accuracy.
Extrapolation means using the regression equation to predict values outside the range of your original data. This is risky. If your data only includes cars weighing 2,000 to 5,000 lbs, predicting mpg for a 500 lb vehicle could give a nonsensical result. Stick to predictions within (or very close to) the range of your data.