Fiveable

🥖Linear Modeling Theory Unit 6 Review

QR code for Linear Modeling Theory practice questions

6.3 Polynomial Regression and Interaction Terms

6.3 Polynomial Regression and Interaction Terms

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Polynomial regression and interaction terms expand the toolkit for modeling complex relationships in multiple linear regression. These techniques capture nonlinear patterns and joint effects between variables, which means your model can reflect how real data actually behaves rather than forcing everything into a straight line.

Nonlinear relationships in regression

Identifying nonlinear relationships

A nonlinear relationship exists when the change in the response variable is not proportional to the change in the predictor. If you increase XX by one unit, the effect on YY isn't the same everywhere along the range of XX.

Scatterplots are your first diagnostic tool here. Look for curves or bends in the data that a straight line can't capture. Common nonlinear patterns include:

  • Quadratic: U-shaped or inverted U-shaped curves (e.g., the relationship between stress and performance)
  • Exponential: rapidly increasing or decreasing values (e.g., population growth over time)
  • Logarithmic: rapid initial change that levels off (e.g., diminishing returns on advertising spending)

Residual plots are equally important. If you fit a linear model to nonlinear data, the residuals will show a systematic pattern (like a curve) instead of random scatter. That pattern is a signal that your model is missing something.

Consequences of ignoring nonlinear relationships

Fitting a linear model to nonlinear data causes real problems:

  • Biased coefficient estimates that systematically over- or understate the true relationship
  • Inaccurate predictions, especially at the extremes of the predictor's range
  • Incorrect conclusions about how the predictor and response are related

These aren't minor issues. If the true relationship is curved and you model it as a line, your predictions will be off in predictable ways, and your residuals will violate the assumptions that make inference valid. Polynomial regression, variable transformations, and non-parametric methods are all ways to address this.

Polynomial regression models

Identifying nonlinear relationships, data visualization - Help interpreting Residuals vs Fitted Plots - Cross Validated

Structure and purpose of polynomial regression

Polynomial regression captures nonlinear relationships by including higher-order terms (squared, cubed, etc.) of the predictor in the model. The general form is:

Y=β0+β1X+β2X2++βpXp+εY = \beta_0 + \beta_1 X + \beta_2 X^2 + \dots + \beta_p X^p + \varepsilon

where pp is the degree of the polynomial.

The quadratic model (p=2p = 2) is by far the most common:

Y=β0+β1X+β2X2+εY = \beta_0 + \beta_1 X + \beta_2 X^2 + \varepsilon

This is enough to model a single bend in the data. You can add cubic (p=3p = 3) or higher terms for more complex curves, but be cautious: higher-degree polynomials fit the training data more closely while becoming increasingly prone to overfitting, where the model captures noise rather than the true underlying pattern.

Interpretation of polynomial regression coefficients

A key point that trips people up: polynomial regression models are still linear models. They're linear in the parameters (β0,β1,β2,\beta_0, \beta_1, \beta_2, \dots), even though they model a nonlinear relationship between XX and YY. This means you can still estimate them with ordinary least squares.

In a quadratic model, the coefficients have specific meanings:

  • β0\beta_0: the expected value of YY when X=0X = 0
  • β1\beta_1: the instantaneous rate of change of YY with respect to XX, evaluated at X=0X = 0. This is not a constant slope the way it is in simple linear regression.
  • β2\beta_2: governs the curvature. If β2>0\beta_2 > 0, the curve is U-shaped (concave up). If β2<0\beta_2 < 0, the curve is inverted-U (concave down).

Because the effect of XX on YY changes depending on where you are along XX, you can't interpret β1\beta_1 in isolation the way you would in a simple linear model. The marginal effect of a one-unit increase in XX is actually β1+2β2X\beta_1 + 2\beta_2 X, which depends on the current value of XX.

You assess whether the polynomial terms are needed using hypothesis tests. If the p-value for β2\beta_2 is not significant, the quadratic term may not be contributing meaningfully, and a simpler linear model might suffice.

Interaction terms in regression

Identifying nonlinear relationships, A Detailed Guide to the ggplot Scatter Plot in R

Understanding interaction effects

Interaction terms capture the joint effect of two predictor variables on the response, beyond what their individual (main) effects account for. The model takes this form:

Y=β0+β1X1+β2X2+β3(X1×X2)+εY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2) + \varepsilon

The coefficient β3\beta_3 tells you how the effect of X1X_1 on YY changes for each one-unit increase in X2X_2 (and vice versa). To see why, rearrange the equation by grouping terms involving X1X_1:

Y=β0+(β1+β3X2)X1+β2X2+εY = \beta_0 + (\beta_1 + \beta_3 X_2) X_1 + \beta_2 X_2 + \varepsilon

Now the "slope" of X1X_1 is β1+β3X2\beta_1 + \beta_3 X_2, which changes depending on the value of X2X_2. That's the interaction in action.

When a significant interaction is present, the main effect coefficients (β1\beta_1 and β2\beta_2) become conditional: β1\beta_1 represents the effect of X1X_1 only when X2=0X_2 = 0, and β2\beta_2 represents the effect of X2X_2 only when X1=0X_1 = 0. If zero isn't a meaningful value for your predictors, consider centering them before fitting the model so the main effects have a more interpretable meaning.

Interpreting and visualizing interaction effects

A significant interaction means the effect of one predictor on the response depends on the level of the other predictor. The best way to understand what's happening is to visualize it.

Interaction plots show the relationship between one predictor and the response at several fixed levels of the other predictor (often low, medium, and high). If the lines in the plot are roughly parallel, there's little interaction. If they diverge or cross, the interaction is meaningful.

Simple slopes analysis quantifies this by estimating the slope of one predictor at specific values of the other. For example, in a study of study time and IQ predicting exam scores, simple slopes analysis might show that additional study time has a large positive effect for students with lower IQ scores but a smaller effect for students with higher IQ scores.

Significance of interaction effects

Assessing statistical significance

The significance of an interaction effect is determined by the p-value for β3\beta_3 in the regression output. A p-value below your chosen threshold (typically 0.05) indicates that the joint effect is statistically significant.

A significant interaction provides evidence for a moderation effect: the relationship between one predictor and the response is moderated by the level of another predictor. This is a stronger claim than simply saying both predictors matter individually.

Practical implications and considerations

Statistical significance alone doesn't tell you whether the interaction matters in practice. Consider these factors:

  • Effect size: How large is β3\beta_3 relative to the main effects? A statistically significant but tiny interaction may not change your conclusions.
  • Units and scale: The raw coefficient depends on the measurement units of both predictors. Standardized coefficients (beta weights) allow you to compare the relative importance of interaction effects across predictors measured on different scales.
  • Context: A significant interaction between price and product quality on sales means the optimal pricing strategy depends on quality level. That's an actionable finding with direct business implications.

Ignoring a significant interaction can lead to misleading conclusions. If the effect of X1X_1 on YY truly varies across levels of X2X_2, a model with only main effects will give you an "average" slope for X1X_1 that's correct for no particular subgroup.

When reporting interaction effects, always include:

  • The direction and magnitude of the interaction coefficient
  • Simple slopes or interaction plots so readers can see what the interaction looks like
  • A clear statement of how the effect of one predictor changes across levels of the other