Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Predictive analytics sits at the heart of modern business decision-making, and your exams will test whether you understand when to apply each model—not just what each model does. The models in this guide represent the core toolkit for forecasting sales, classifying customers, detecting fraud, and optimizing operations. You'll encounter questions that ask you to select the right model for a given business scenario, interpret model outputs, and explain trade-offs between accuracy, interpretability, and computational cost.
These models fall into distinct categories based on their underlying mechanisms: regression for continuous outcomes, classification for categorical predictions, ensemble methods for improved accuracy, and unsupervised learning for pattern discovery. Don't just memorize definitions—know what problem type each model solves, what assumptions it makes, and when it outperforms alternatives. That conceptual understanding is what separates strong exam performance from mediocre recall.
Regression models form the foundation of predictive analytics by establishing mathematical relationships between input variables and outcomes. The key distinction is whether you're predicting a number (linear regression) or a category (logistic regression).
Compare: Linear Regression vs. Logistic Regression—both model relationships between variables, but linear regression predicts continuous values while logistic regression predicts category probabilities. If an exam question involves predicting "how much" use linear; if it's "which group," use logistic.
Tree-based models split data into branches based on feature values, creating rule-based predictions that mirror human decision-making. Their visual, flowchart structure makes them exceptionally interpretable for business stakeholders.
Compare: Decision Trees vs. Random Forests—single trees are highly interpretable but overfit easily; random forests sacrifice some interpretability for significantly better accuracy and robustness. Choose decision trees when you need to explain every rule; choose random forests when prediction accuracy matters most.
When simple linear boundaries won't separate your classes, these models find more sophisticated decision boundaries. They excel when relationships between features and outcomes are non-linear or when you're working in high-dimensional spaces.
Compare: SVM vs. KNN—both handle non-linear classification, but SVM learns a decision boundary during training while KNN makes decisions at prediction time. SVM scales better to large datasets; KNN requires no training but slows down as data grows.
Neural networks use layers of interconnected nodes to learn hierarchical representations of data. Their power comes from automatically discovering relevant features rather than requiring manual feature engineering.
Compare: Neural Networks vs. Random Forests—both handle complex non-linear relationships, but random forests provide feature importance and work well with smaller datasets, while neural networks require more data but can learn more sophisticated patterns. For business applications requiring explainability, lean toward random forests.
Time series models specifically handle data where observations are ordered chronologically and past values influence future outcomes. The key insight is that time introduces dependencies—today's value relates to yesterday's.
Unlike supervised models that predict known outcomes, unsupervised models find patterns in data without labeled examples. They're exploratory tools that reveal structure you didn't know existed.
Compare: Clustering vs. Classification—classification assigns data to known categories using labeled training data; clustering discovers unknown groupings without labels. Use classification when you know what groups exist; use clustering when you're exploring what groups might exist.
| Concept | Best Examples |
|---|---|
| Predicting continuous values | Linear Regression, Neural Networks, Time Series |
| Binary classification | Logistic Regression, SVM, Naive Bayes |
| Multi-class classification | Decision Trees, Random Forests, KNN, Neural Networks |
| High interpretability | Linear Regression, Logistic Regression, Decision Trees |
| Handling non-linear relationships | SVM (with kernels), Neural Networks, Random Forests |
| Large dataset performance | Random Forests, Neural Networks, Naive Bayes |
| Discovering hidden patterns | Clustering Algorithms |
| Sequential/temporal data | Time Series Analysis |
A marketing team wants to predict which customers will respond to a campaign (yes/no). Which two models would be most appropriate, and what trade-off exists between them in terms of interpretability?
Compare and contrast Random Forests and Decision Trees: what problem does Random Forests solve that single Decision Trees struggle with, and what do you sacrifice by using the ensemble approach?
You have a dataset with 10,000 features but only 500 observations. Which model is specifically designed to handle this high-dimensional scenario, and why does it work well here?
A retail company needs to identify distinct customer segments for targeted marketing but has no predefined categories. Which type of model should they use, and how does this differ from classification?
An FRQ asks you to recommend a model for predicting next quarter's sales based on five years of monthly data. Which model category is designed for this problem, and what three components would you expect to identify in the data?