Fiveable

📈College Algebra Unit 4 Review

QR code for College Algebra practice questions

4.3 Fitting Linear Models to Data

4.3 Fitting Linear Models to Data

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📈College Algebra
Unit & Topic Study Guides

Fitting Linear Models to Data

Linear models let you describe the relationship between two variables using an equation. Once you have that equation, you can make predictions: given a new x-value, what y-value should you expect? This section covers how to build those models from real data, how to interpret them, and where they break down.

Scatter Plots for Variable Relationships

A scatter plot is the starting point for any data-fitting problem. It plots pairs of values (x,y)(x, y) so you can see whether a pattern exists before you try to model it.

  • The independent variable (explanatory) goes on the x-axis. This is the variable you think might influence the other (e.g., hours spent studying).
  • The dependent variable (response) goes on the y-axis. This is the outcome you're measuring (e.g., exam score).

Once the points are plotted, look for a correlation pattern:

  • Positive correlation: As x increases, y tends to increase. Think height and weight: taller people generally weigh more.
  • Negative correlation: As x increases, y tends to decrease. A classic example is price and quantity demanded: raise the price, and fewer people buy.
  • No correlation: The points show no clear trend. Shoe size and IQ, for instance, have no meaningful relationship.

Also watch for outliers, which are data points that fall far from the overall pattern. A single outlier can distort your results, so it's worth investigating whether it reflects a data entry error or a genuinely unusual observation.

Scatter plots for variable relationships, Draw and interpret scatter plots | College Algebra

Line of Best Fit Interpretation

The line of best fit (also called the regression line) is the straight line that best summarizes the trend in your scatter plot. "Best" here has a precise meaning: the line minimizes the sum of the squared vertical distances between each data point and the line itself. This method is called least-squares regression.

In practice, you'll use a graphing calculator or spreadsheet to compute it. The output is an equation in the form:

y=mx+by = mx + b

Each part of that equation tells you something specific:

  • Slope (m)(m): The predicted change in y for every one-unit increase in x. If m=3.2m = 3.2 in a model relating study hours to exam points, each additional hour of studying predicts about 3.2 more points on the exam.
  • Y-intercept (b)(b): The predicted y-value when x=0x = 0. This is the "starting point" of the model. Sometimes it makes practical sense (a base exam score with zero study hours); sometimes it doesn't (a model predicting weight from height at height = 0 inches is meaningless).
  • Correlation coefficient (r)(r): A number between 1-1 and 11 that measures the strength and direction of the linear relationship. Values near +1+1 indicate a strong positive linear trend, values near 1-1 indicate a strong negative linear trend, and values near 00 suggest little to no linear relationship.

One more concept to know: residuals. A residual is the difference between an observed y-value and the y-value predicted by the line. In other words, residual=yobservedypredicted\text{residual} = y_{\text{observed}} - y_{\text{predicted}}. Residuals tell you how far off the model is for each data point.

Scatter plots for variable relationships, Line Fitting, Residuals, and Correlation | Introduction to Statistics

Linear vs. Nonlinear Relationships

Not every relationship between variables is linear. A linear relationship shows a roughly constant rate of change: the data points cluster around a straight line. A nonlinear relationship has a rate of change that varies across the range of x-values, so the data curves rather than following a line.

Common nonlinear patterns include:

  • Exponential: Growth that accelerates over time (bacterial population doubling)
  • Quadratic: Data that rises then falls, or vice versa (the arc of a thrown ball)
  • Logarithmic: Rapid change at first that levels off (the pH scale)

You can often tell which type you're dealing with by inspecting the scatter plot. But a more reliable check is a residual plot: graph the residuals against the x-values after fitting a linear model.

  • If the residuals scatter randomly above and below zero with no visible pattern, a linear model is a reasonable fit.
  • If the residuals form a curve or systematic pattern (e.g., negative, then positive, then negative again), the relationship is likely nonlinear, and a straight line isn't capturing the true trend.

Regression Analysis for Predictive Modeling

Regression analysis is the formal process of fitting a mathematical equation to data so you can make predictions. In simple linear regression, you model the relationship between one independent variable and one dependent variable with a linear equation.

Here are the steps for building a predictive model:

  1. Collect and organize your data into paired (x,y)(x, y) values.
  2. Create a scatter plot to check whether a linear model is reasonable.
  3. Calculate the line of best fit using technology (calculator, Excel, Desmos, etc.).
  4. Assess goodness of fit using the correlation coefficient (r)(r) and the coefficient of determination (r2)(r^2). The value r2r^2 tells you the proportion of the variation in y that's explained by the model. For example, r2=0.85r^2 = 0.85 means 85% of the variation in y is accounted for by the linear relationship with x.
  5. Use the regression equation to predict y-values for given x-values.

Three important limitations to keep in mind:

  • Extrapolation is risky. Your model is only reliable within the range of x-values you actually observed. Predicting far outside that range (e.g., using 10 years of sales data to forecast 50 years ahead) can produce wildly inaccurate results because the trend may not hold.
  • Correlation does not imply causation. A strong correlation between two variables doesn't mean one causes the other. There may be confounding variables at play. Ice cream sales and drowning rates are correlated, but that's because both increase in hot weather, not because ice cream causes drowning.
  • Outliers and influential points can pull the regression line toward them, distorting the slope and intercept. Always check your scatter plot and residuals for points that might be skewing the model.