Linear Regression and Textbook Costs
Linear regression models the relationship between a measurable factor (like page count) and textbook cost, letting you predict prices and quantify what drives them. This section applies the core regression and correlation concepts from Unit 12 to a concrete example.

Interpretation of Linear Regression Components
Each part of a regression equation tells you something specific about how textbook prices behave.
- Y-intercept () represents the base cost of a textbook when the independent variable equals zero. If , the model estimates a textbook would cost $50 before any additional factors (like page count) are considered. This value doesn't always have a practical real-world meaning, but it anchors the regression line.
- Slope () represents the change in textbook cost for each one-unit increase in the independent variable. If the independent variable is number of pages and , each additional page adds $0.10 to the predicted cost.
- A positive slope means cost increases as the variable increases (direct relationship)
- A negative slope means cost decreases as the variable increases (inverse relationship)
- Residuals are the differences between observed and predicted textbook costs (observed minus predicted). Large residuals signal that the model's prediction was off for that data point. Examining residuals helps you assess whether a linear model is appropriate in the first place.
Correlation in Textbook Cost Analysis
The correlation coefficient () measures the strength and direction of a linear relationship between two variables. It ranges from to .
- close to : strong positive linear relationship (both variables increase together)
- close to : strong negative linear relationship (one increases while the other decreases)
- close to : weak or no linear relationship
For textbook costs, you might expect correlations like these:
- Page count tends to have a positive correlation with cost, since more pages mean higher production expenses
- Demand factors (popular subjects, well-known authors) may correlate positively with cost because publishers can charge more
- Edition age may have a negative correlation with cost, since older editions typically sell for less than newer ones
The coefficient of determination () tells you the proportion of variance in textbook cost that the independent variable explains. An of 0.72, for example, means 72% of the variation in cost is accounted for by the model. The remaining 28% comes from factors not included.

Prediction Using Regression Equations
The simple linear regression equation takes the form:
where is the predicted textbook cost, is the y-intercept, is the slope, and is the value of the independent variable.
Steps to predict a textbook cost:
- Identify and from the given regression equation
- Determine the value of you want to plug in
- Substitute , , and into the equation
- Calculate
Example: Suppose the regression equation is , where is the number of pages. Predict the cost of a 400-page textbook.
The predicted cost is $90. Keep in mind this is an estimate. The actual price could differ, and that difference is the residual for that observation.
A caution worth remembering: predictions are only reliable within the range of your data. If your dataset contains textbooks from 100 to 800 pages, predicting the cost of a 2,000-page textbook (extrapolation) is risky because you have no evidence the linear trend holds that far out.
Advanced Regression Techniques
- The least squares method finds the best-fitting line by minimizing the sum of squared residuals. This ensures the line is as close as possible to all the data points collectively, not just a few.
- Multiple regression extends the model to include several independent variables at once (e.g., page count, edition number, and subject area). The equation becomes , which can capture more of the variation in textbook cost than any single predictor alone.