upgrade
upgrade

🤖Statistical Prediction

Types of Regression Models

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Regression models form the backbone of statistical prediction and machine learning—and you're being tested on knowing when to use each type, not just what they are. The exam will challenge you to match real-world scenarios to appropriate models, understand the tradeoffs between model complexity and interpretability, and recognize when techniques like regularization, variable selection, and nonlinear transformations become necessary. These aren't just formulas to memorize; they're tools that solve fundamentally different problems.

The key insight connecting all regression models is the bias-variance tradeoff. Simple models (like linear regression) may underfit complex data, while flexible models (like high-degree polynomials) risk overfitting. Regularization techniques exist precisely to navigate this tension. When you encounter a regression question, don't just ask "what's the formula?"—ask "what problem does this model solve, and what assumptions does it make?" That's how you'll tackle both multiple-choice questions and FRQs with confidence.


Linear Foundations: When Straight Lines Work

These models assume that relationships between predictors and outcomes can be captured with linear combinations. The core assumption is that the effect of each predictor is additive and proportional.

Linear Regression

  • Models y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon—the simplest regression form, fitting a straight line through data points
  • Least squares estimation minimizes (yiy^i)2\sum(y_i - \hat{y}_i)^2, making it computationally efficient and interpretable
  • Assumes homoscedasticity and normality of residuals—violations signal you may need a different model type

Multiple Linear Regression

  • Extends to y=β0+β1x1+β2x2+...+βpxp+ϵy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon—analyzing multiple predictors simultaneously
  • Multicollinearity occurs when predictors are highly correlated, inflating variance of coefficient estimates
  • Coefficient interpretation becomes "effect of xjx_j holding all other predictors constant"—a key exam concept

Compare: Linear Regression vs. Multiple Linear Regression—both assume linearity and use least squares, but multiple regression introduces multicollinearity concerns and partial effect interpretation. FRQs often ask you to explain what "holding other variables constant" means practically.


Regularization: Controlling Model Complexity

When you have many predictors or correlated features, standard regression can overfit or produce unstable estimates. Regularization adds a penalty term to the loss function, shrinking coefficients toward zero.

Ridge Regression

  • Adds L2 penalty: minimizes (yiy^i)2+λβj2\sum(y_i - \hat{y}_i)^2 + \lambda\sum\beta_j^2—shrinks all coefficients but never to exactly zero
  • Solves multicollinearity by stabilizing coefficient estimates when predictors are correlated
  • Tuning parameter λ\lambda controls penalty strength—larger values mean more shrinkage, more bias, less variance

Lasso Regression

  • Uses L1 penalty: minimizes (yiy^i)2+λβj\sum(y_i - \hat{y}_i)^2 + \lambda\sum|\beta_j|—can shrink coefficients exactly to zero
  • Performs automatic variable selection, making it ideal when you suspect many predictors are irrelevant
  • Produces sparse models that are easier to interpret than ridge when true signal involves few predictors

Compare: Ridge vs. Lasso—both regularize, but ridge keeps all predictors (just shrunk) while lasso performs selection. If an FRQ asks about feature selection with many predictors, lasso is your go-to answer. Ridge is better when you believe all predictors contribute.


Beyond Linearity: Capturing Complex Patterns

When the true relationship between variables curves, bends, or follows a nonlinear pattern, these models introduce flexibility through polynomial terms or general nonlinear functions.

Polynomial Regression

  • Fits y=β0+β1x+β2x2+...+βnxn+ϵy = \beta_0 + \beta_1 x + \beta_2 x^2 + ... + \beta_n x^n + \epsilon—still linear in parameters, nonlinear in predictors
  • Captures curvature in relationships that simple linear regression misses entirely
  • High-degree polynomials risk overfitting—the model fits training data perfectly but generalizes poorly to new data

Nonlinear Regression

  • Models relationships using nonlinear functions like y=β0eβ1x+ϵy = \beta_0 e^{\beta_1 x} + \epsilon where parameters appear nonlinearly
  • Requires iterative optimization (e.g., gradient descent) since no closed-form solution exists
  • Demands careful model specification—you must choose the functional form based on domain knowledge

Compare: Polynomial vs. Nonlinear Regression—polynomial regression is technically linear in parameters (just transform xx), while true nonlinear regression has parameters inside nonlinear functions. Polynomial is easier to fit; nonlinear requires more computational effort but can match theory-driven functional forms.


Classification and Count Data: Different Outcome Types

Not all outcomes are continuous. When your dependent variable is categorical or represents counts, you need models designed for those distributions.

Logistic Regression

  • Models probability of binary outcome: P(y=1)=11+e(β0+β1x)P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}—output bounded between 0 and 1
  • Uses maximum likelihood estimation rather than least squares, since outcome isn't continuous
  • Coefficients represent log-oddseβ1e^{\beta_1} gives the odds ratio for a one-unit increase in xx

Poisson Regression

  • Models count data where y{0,1,2,...}y \in \{0, 1, 2, ...\}—appropriate for event frequencies like accidents, arrivals, or disease cases
  • Assumes mean equals variance—when variance exceeds mean (overdispersion), consider negative binomial regression instead
  • Uses log link function: log(μ)=β0+β1x\log(\mu) = \beta_0 + \beta_1 x—coefficients represent multiplicative effects on the expected count

Compare: Logistic vs. Poisson Regression—logistic handles binary yes/no outcomes, Poisson handles "how many" count outcomes. Both use maximum likelihood and link functions, but they model fundamentally different types of dependent variables. Know which outcome type triggers which model.


Model Selection: Building Parsimonious Models

Sometimes the goal isn't just fitting data—it's finding the simplest adequate model. These techniques help you decide which predictors to include.

Stepwise Regression

  • Iteratively adds or removes predictors based on criteria like p-values, AIC, or BIC
  • Three flavors: forward selection (start empty, add predictors), backward elimination (start full, remove predictors), or bidirectional
  • Criticized for multiple testing issues—can overfit by capitalizing on chance; cross-validation recommended alongside

Compare: Stepwise vs. Lasso—both perform variable selection, but stepwise uses discrete add/remove decisions while lasso uses continuous shrinkage. Lasso is generally preferred in modern practice because it's less prone to overfitting and handles correlated predictors better.


Time-Dependent Data: When Order Matters

Standard regression assumes observations are independent. When data points are ordered in time, you need models that account for temporal structure.

Time Series Regression

  • Incorporates lagged variables like yt1y_{t-1} or xt1x_{t-1} to capture how past values predict current outcomes
  • Accounts for autocorrelation—the tendency for nearby time points to be more similar than distant ones
  • Often combined with ARIMA components to model trends, seasonality, and residual correlation structure

Compare: Time Series Regression vs. Standard Multiple Regression—both can have multiple predictors, but time series regression explicitly models temporal dependencies. Ignoring autocorrelation in time data leads to incorrect standard errors and misleading inference.


Quick Reference Table

ConceptBest Examples
Linear relationshipsLinear Regression, Multiple Linear Regression
Regularization to prevent overfittingRidge Regression, Lasso Regression
Variable/feature selectionLasso Regression, Stepwise Regression
Handling multicollinearityRidge Regression, Lasso Regression
Nonlinear patternsPolynomial Regression, Nonlinear Regression
Binary/categorical outcomesLogistic Regression
Count dataPoisson Regression
Temporal dependenciesTime Series Regression

Self-Check Questions

  1. You have 50 predictors but suspect only 5-10 are truly relevant. Which regression technique performs automatic variable selection, and how does it differ from ridge regression?

  2. Compare and contrast polynomial regression and nonlinear regression. Why is polynomial regression considered "linear" even though it can fit curves?

  3. A researcher is modeling the number of customer complaints per day. Why would Poisson regression be more appropriate than linear regression, and what assumption should they check?

  4. If an FRQ presents a dataset with highly correlated predictors and asks you to build a stable predictive model, which two regularization approaches would you consider, and what's the key tradeoff between them?

  5. Explain why ignoring temporal autocorrelation when analyzing time-ordered data leads to problems, even if you use multiple linear regression with relevant predictors.