Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Get Started
Why This Matters
Regression analysis is the backbone of predictive modeling in data science—it's how we quantify relationships between variables and make data-driven predictions. You're being tested not just on knowing these methods exist, but on understanding when to use each one and why. The core concepts you need to master include linearity assumptions, regularization techniques, dimensionality reduction, and model selection trade-offs.
Think of regression methods as tools in a toolkit: a simple linear regression is your basic hammer, but sometimes you need the precision of regularization or the flexibility of polynomial terms. Don't just memorize formulas—know what problem each method solves, what assumptions it requires, and how it compares to alternatives. That's what separates strong exam responses from weak ones.
Linear Foundation Methods
These methods assume a linear relationship between predictors and outcomes. They're your starting point for regression analysis and the foundation for understanding more complex techniques.
Simple Linear Regression
- Models the relationship between exactly two variables—one independent (X) and one dependent (Y), fitting the best straight line through your data
- Core equation: Y=β0+β1X+ε, where β0 is the intercept, β1 is the slope, and ε represents error
- Key assumptions: linearity, homoscedasticity (constant variance), and normally distributed residuals—violations compromise your predictions
Multiple Linear Regression
- Extends simple regression to multiple predictors—the equation becomes Y=β0+β1X1+β2X2+...+βnXn+ε
- Controls for confounding variables by isolating each predictor's unique contribution while holding others constant
- Watch for multicollinearity—when predictors are highly correlated, coefficient estimates become unstable and interpretation suffers
Compare: Simple Linear Regression vs. Multiple Linear Regression—both assume linearity, but multiple regression handles real-world complexity where outcomes depend on several factors. If an FRQ asks about controlling for confounding variables, multiple regression is your answer.
Non-Linear Relationship Methods
When your data curves, linear methods fail. These approaches capture non-linear patterns while still using regression frameworks you already understand.
Polynomial Regression
- Captures curved relationships by adding polynomial terms: Y=β0+β1X+β2X2+...+βnXn+ε
- Higher degrees increase flexibility but dramatically increase overfitting risk—the bias-variance trade-off in action
- Still technically linear regression—it's linear in the coefficients, just non-linear in the features
Generalized Linear Models (GLMs)
- Extends regression beyond normal distributions—handles count data (Poisson), binary outcomes (logistic), and other response types
- Uses a link function to connect the linear predictor to the mean of the response variable, adapting to different data structures
- Encompasses multiple regression types including logistic and Poisson regression—it's a framework, not a single method
Compare: Polynomial Regression vs. GLMs—polynomial regression handles curved relationships with continuous outcomes, while GLMs handle different types of outcomes (binary, count, etc.). Know which problem each solves.
Classification Methods
When your outcome is categorical rather than continuous, you need methods designed for classification problems.
Logistic Regression
- Predicts probability of categorical outcomes—despite the name, it's used for classification, not continuous prediction
- Uses the logit function: logit(P)=β0+β1X1+β2X2+...+βnXn, where P is the probability of the event
- Assumes linearity in log-odds—the relationship between predictors and the logarithm of the odds must be linear
Regularization Methods
When you have many predictors or multicollinearity issues, standard regression overfits. Regularization adds penalties to keep coefficients in check.
Ridge Regression
- Adds L2 penalty to the loss function—penalizes the square of coefficient magnitudes to shrink them toward zero
- Handles multicollinearity by distributing weight across correlated predictors rather than arbitrarily choosing one
- Never eliminates variables—coefficients shrink but don't reach exactly zero, so all predictors stay in the model
Lasso Regression
- Adds L1 penalty using absolute values of coefficients—can shrink coefficients all the way to zero
- Performs automatic variable selection—effectively removes irrelevant predictors, creating simpler, more interpretable models
- Trade-off: introduces bias to reduce variance, but the resulting model often generalizes better to new data
Compare: Ridge vs. Lasso—both regularize, but Ridge keeps all variables (just shrinks them) while Lasso can eliminate variables entirely. Use Lasso when you suspect many predictors are irrelevant; use Ridge when you want to keep everything but reduce multicollinearity effects.
Dimensionality Reduction Methods
When you have more predictors than observations—or severe multicollinearity—these methods transform your feature space before regression.
Principal Component Regression (PCR)
- Combines PCA with regression—transforms correlated predictors into uncorrelated principal components, then regresses on those
- Reduces dimensionality while retaining most variance in the predictors—but ignores the response variable during transformation
- Addresses multicollinearity by construction, since principal components are orthogonal (uncorrelated) by definition
Partial Least Squares Regression (PLS)
- Maximizes covariance with the response—unlike PCR, it considers the outcome variable when creating components
- Ideal for high-dimensional data where predictors outnumber observations or are highly collinear
- Balances two goals: reducing dimensionality and explaining variance in Y—often outperforms PCR for prediction
Compare: PCR vs. PLS—both reduce dimensions, but PCR ignores Y when creating components while PLS specifically optimizes for predicting Y. PLS typically performs better for prediction; PCR is simpler to interpret.
Model Selection Methods
These approaches help you choose which predictors to include, balancing model complexity against predictive power.
Stepwise Regression
- Iteratively adds or removes predictors based on statistical criteria like AIC or BIC—can be forward, backward, or bidirectional
- Builds parsimonious models by testing whether each variable improves fit enough to justify its inclusion
- Known limitations: can overfit to sample-specific patterns and may miss the globally optimal subset—use with caution
Compare: Stepwise Regression vs. Lasso—both perform variable selection, but stepwise uses discrete add/remove decisions while Lasso uses continuous shrinkage. Lasso is generally more stable and less prone to overfitting.
Quick Reference Table
|
| Linear relationships | Simple Linear Regression, Multiple Linear Regression |
| Non-linear relationships | Polynomial Regression, GLMs |
| Binary/categorical outcomes | Logistic Regression, GLMs |
| Regularization (keeps all variables) | Ridge Regression |
| Regularization (variable selection) | Lasso Regression |
| Multicollinearity solutions | Ridge, PCR, PLS |
| Dimensionality reduction | PCR, PLS |
| Model selection | Stepwise Regression, Lasso |
Self-Check Questions
-
Which two regression methods both use regularization but differ in whether they can eliminate variables entirely? What mathematical difference causes this?
-
You have a dataset with 50 observations and 200 predictors. Which methods would be appropriate, and why would simple multiple regression fail?
-
Compare and contrast PCR and PLS: What do they share, and what key difference affects their predictive performance?
-
A colleague wants to predict whether customers will churn (yes/no). They suggest using multiple linear regression. What method should they use instead, and why?
-
If an FRQ asks you to address multicollinearity in a regression model, what three distinct approaches could you discuss, and how does each solve the problem differently?