🤖Statistical Prediction

Types of Regression Models

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Regression models form the backbone of statistical prediction and machine learning—and you're being tested on knowing when to use each type, not just what they are. The exam will challenge you to match real-world scenarios to appropriate models, understand the tradeoffs between model complexity and interpretability, and recognize when techniques like regularization, variable selection, and nonlinear transformations become necessary. These aren't just formulas to memorize; they're tools that solve fundamentally different problems.

The key insight connecting all regression models is the bias-variance tradeoff. Simple models (like linear regression) may underfit complex data, while flexible models (like high-degree polynomials) risk overfitting. Regularization techniques exist precisely to navigate this tension. When you encounter a regression question, don't just ask "what's the formula?"—ask "what problem does this model solve, and what assumptions does it make?" That's how you'll tackle both multiple-choice questions and FRQs with confidence.

Linear Foundations: When Straight Lines Work

These models assume that relationships between predictors and outcomes can be captured with linear combinations. The core assumption is that the effect of each predictor is additive and proportional.

Linear Regression

Models $y = \beta_0 + \beta_1 x + \epsilon$ —the simplest regression form, fitting a straight line through data points
Least squares estimation minimizes $\sum(y_i - \hat{y}_i)^2$ , making it computationally efficient and interpretable
Assumes homoscedasticity and normality of residuals—violations signal you may need a different model type

Multiple Linear Regression

Extends to $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon$ —analyzing multiple predictors simultaneously
Multicollinearity occurs when predictors are highly correlated, inflating variance of coefficient estimates
Coefficient interpretation becomes "effect of $x_j$ holding all other predictors constant"—a key exam concept

Compare: Linear Regression vs. Multiple Linear Regression—both assume linearity and use least squares, but multiple regression introduces multicollinearity concerns and partial effect interpretation. FRQs often ask you to explain what "holding other variables constant" means practically.

Regularization: Controlling Model Complexity

When you have many predictors or correlated features, standard regression can overfit or produce unstable estimates. Regularization adds a penalty term to the loss function, shrinking coefficients toward zero.

Ridge Regression

Adds L2 penalty: minimizes $\sum(y_i - \hat{y}_i)^2 + \lambda\sum\beta_j^2$ —shrinks all coefficients but never to exactly zero
Solves multicollinearity by stabilizing coefficient estimates when predictors are correlated
Tuning parameter $\lambda$ controls penalty strength—larger values mean more shrinkage, more bias, less variance

Lasso Regression

Uses L1 penalty: minimizes $\sum(y_i - \hat{y}_i)^2 + \lambda\sum|\beta_j|$ —can shrink coefficients exactly to zero
Performs automatic variable selection, making it ideal when you suspect many predictors are irrelevant
Produces sparse models that are easier to interpret than ridge when true signal involves few predictors

Compare: Ridge vs. Lasso—both regularize, but ridge keeps all predictors (just shrunk) while lasso performs selection. If an FRQ asks about feature selection with many predictors, lasso is your go-to answer. Ridge is better when you believe all predictors contribute.

Beyond Linearity: Capturing Complex Patterns

When the true relationship between variables curves, bends, or follows a nonlinear pattern, these models introduce flexibility through polynomial terms or general nonlinear functions.

Polynomial Regression

Fits $y = \beta_0 + \beta_1 x + \beta_2 x^2 + ... + \beta_n x^n + \epsilon$ —still linear in parameters, nonlinear in predictors
Captures curvature in relationships that simple linear regression misses entirely
High-degree polynomials risk overfitting—the model fits training data perfectly but generalizes poorly to new data

Nonlinear Regression

Models relationships using nonlinear functions like $y = \beta_0 e^{\beta_1 x} + \epsilon$ where parameters appear nonlinearly
Requires iterative optimization (e.g., gradient descent) since no closed-form solution exists
Demands careful model specification—you must choose the functional form based on domain knowledge

Compare: Polynomial vs. Nonlinear Regression—polynomial regression is technically linear in parameters (just transform $x$ ), while true nonlinear regression has parameters inside nonlinear functions. Polynomial is easier to fit; nonlinear requires more computational effort but can match theory-driven functional forms.

Classification and Count Data: Different Outcome Types

Not all outcomes are continuous. When your dependent variable is categorical or represents counts, you need models designed for those distributions.

Logistic Regression

Models probability of binary outcome: $P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$ —output bounded between 0 and 1
Uses maximum likelihood estimation rather than least squares, since outcome isn't continuous
Coefficients represent log-odds— $e^{\beta_1}$ gives the odds ratio for a one-unit increase in $x$

Poisson Regression

Models count data where $y \in \{0, 1, 2, ...\}$ —appropriate for event frequencies like accidents, arrivals, or disease cases
Assumes mean equals variance—when variance exceeds mean (overdispersion), consider negative binomial regression instead
Uses log link function: $\log(\mu) = \beta_0 + \beta_1 x$ —coefficients represent multiplicative effects on the expected count

Compare: Logistic vs. Poisson Regression—logistic handles binary yes/no outcomes, Poisson handles "how many" count outcomes. Both use maximum likelihood and link functions, but they model fundamentally different types of dependent variables. Know which outcome type triggers which model.

Model Selection: Building Parsimonious Models

Sometimes the goal isn't just fitting data—it's finding the simplest adequate model. These techniques help you decide which predictors to include.

Stepwise Regression

Iteratively adds or removes predictors based on criteria like p-values, AIC, or BIC
Three flavors: forward selection (start empty, add predictors), backward elimination (start full, remove predictors), or bidirectional
Criticized for multiple testing issues—can overfit by capitalizing on chance; cross-validation recommended alongside

Compare: Stepwise vs. Lasso—both perform variable selection, but stepwise uses discrete add/remove decisions while lasso uses continuous shrinkage. Lasso is generally preferred in modern practice because it's less prone to overfitting and handles correlated predictors better.

Time-Dependent Data: When Order Matters

Standard regression assumes observations are independent. When data points are ordered in time, you need models that account for temporal structure.

Time Series Regression

Incorporates lagged variables like $y_{t-1}$ or $x_{t-1}$ to capture how past values predict current outcomes
Accounts for autocorrelation—the tendency for nearby time points to be more similar than distant ones
Often combined with ARIMA components to model trends, seasonality, and residual correlation structure

Compare: Time Series Regression vs. Standard Multiple Regression—both can have multiple predictors, but time series regression explicitly models temporal dependencies. Ignoring autocorrelation in time data leads to incorrect standard errors and misleading inference.

Quick Reference Table

Concept	Best Examples
Linear relationships	Linear Regression, Multiple Linear Regression
Regularization to prevent overfitting	Ridge Regression, Lasso Regression
Variable/feature selection	Lasso Regression, Stepwise Regression
Handling multicollinearity	Ridge Regression, Lasso Regression
Nonlinear patterns	Polynomial Regression, Nonlinear Regression
Binary/categorical outcomes	Logistic Regression
Count data	Poisson Regression
Temporal dependencies	Time Series Regression

Self-Check Questions

You have 50 predictors but suspect only 5-10 are truly relevant. Which regression technique performs automatic variable selection, and how does it differ from ridge regression?
Compare and contrast polynomial regression and nonlinear regression. Why is polynomial regression considered "linear" even though it can fit curves?
A researcher is modeling the number of customer complaints per day. Why would Poisson regression be more appropriate than linear regression, and what assumption should they check?
If an FRQ presents a dataset with highly correlated predictors and asks you to build a stable predictive model, which two regularization approaches would you consider, and what's the key tradeoff between them?
Explain why ignoring temporal autocorrelation when analyzing time-ordered data leads to problems, even if you use multiple linear regression with relevant predictors.

🤖Statistical Prediction

Types of Regression Models

Why This Matters

Linear Foundations: When Straight Lines Work

Linear Regression

Multiple Linear Regression

Regularization: Controlling Model Complexity

Ridge Regression

Lasso Regression

Beyond Linearity: Capturing Complex Patterns

Polynomial Regression

Nonlinear Regression

Classification and Count Data: Different Outcome Types

Logistic Regression

Poisson Regression

Model Selection: Building Parsimonious Models

Stepwise Regression

Time-Dependent Data: When Order Matters

Time Series Regression

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes