Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Regression analysis is the backbone of statistical inference and predictive modeling—and it's everywhere on the exam. You're being tested on your ability to choose the right type of regression for a given data scenario, interpret coefficients correctly, and recognize when model assumptions are violated. Whether you're predicting continuous outcomes, classifying binary events, or handling messy real-world data with multicollinearity, understanding regression types demonstrates your grasp of model selection, assumption checking, and bias-variance tradeoffs.
Don't just memorize the names of these regression methods. Know when each type applies, what assumptions it requires, and how it handles problems like overfitting, non-linearity, and correlated predictors. An FRQ might give you a scenario and ask you to justify your model choice—that's where conceptual understanding beats rote recall every time.
These foundational methods assume that the relationship between predictors and outcomes can be captured with a straight line (or a flat plane in higher dimensions). The key mechanism is minimizing the sum of squared residuals to find the best-fit line.
Compare: Simple Linear Regression vs. Multiple Linear Regression—both minimize squared residuals and assume linearity, but multiple regression lets you analyze several factors simultaneously. If an FRQ asks about controlling for confounding variables, multiple regression is your answer.
When data curves, bends, or follows exponential/logarithmic patterns, linear models fail. These methods capture relationships where the rate of change itself changes across the range of predictors.
Compare: Polynomial vs. Nonlinear Regression—polynomial regression adds powers of to a linear framework, while nonlinear regression fits inherently curved functions. Choose polynomial for exploratory analysis; choose nonlinear when theory suggests a specific functional form.
Not all dependent variables are continuous. When your outcome is binary (yes/no, success/failure), you need methods that predict probabilities bounded between 0 and 1.
Compare: Linear vs. Logistic Regression—linear regression predicts continuous values (potentially outside 0-1), while logistic regression predicts bounded probabilities. If the outcome is binary, logistic regression is always the correct choice.
When you have many predictors or correlated variables, standard regression can overfit or produce unstable estimates. Regularization adds a penalty term to the loss function, shrinking coefficients toward zero.
Compare: Ridge vs. Lasso Regression—both prevent overfitting through regularization, but ridge keeps all variables (shrunk) while lasso performs automatic variable selection. Use lasso when you suspect many predictors are irrelevant; use ridge when all predictors likely matter but are correlated.
Sometimes the challenge isn't fitting a model—it's deciding which predictors to include. These methods systematically search for the best subset of variables.
Compare: Stepwise vs. Lasso Regression—both simplify models by reducing predictors, but stepwise uses discrete add/remove decisions while lasso uses continuous shrinkage. Lasso is generally preferred in modern practice because it's less prone to overfitting.
Some data structures require tailored regression approaches. These methods address specific challenges like time dependence or distributional asymmetry.
Compare: OLS vs. Quantile Regression—OLS estimates the conditional mean and assumes homoscedasticity, while quantile regression estimates any conditional quantile and handles heterogeneous variance. Use quantile regression when you care about effects beyond the average or when outliers are a concern.
| Concept | Best Examples |
|---|---|
| Continuous outcome, linear relationship | Simple Linear Regression, Multiple Linear Regression |
| Non-linear patterns | Polynomial Regression, Nonlinear Regression |
| Binary/categorical outcome | Logistic Regression |
| Preventing overfitting | Ridge Regression, Lasso Regression |
| Variable selection | Lasso Regression, Stepwise Regression |
| Correlated predictors (multicollinearity) | Ridge Regression |
| Time-dependent data | Time Series Regression |
| Robust to outliers / distributional analysis | Quantile Regression |
You have a dataset with 50 predictors, many of which are likely irrelevant. Which regression method would simultaneously fit the model and identify the most important variables?
Compare and contrast Ridge and Lasso regression: What do they share in common, and when would you choose one over the other?
A researcher wants to predict whether a customer will churn (yes/no) based on usage patterns. Why would linear regression be inappropriate, and what method should they use instead?
Your scatter plot shows a clear U-shaped relationship between study hours and test anxiety. Which two regression types could capture this pattern, and how do they differ in approach?
An economist studying income inequality wants to understand how education affects earnings at the 10th, 50th, and 90th percentiles of the income distribution. Which regression method is designed for this purpose, and why is it preferable to standard OLS here?