upgrade
upgrade

⛽️Business Analytics

Key Predictive Analytics Models

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Predictive analytics sits at the heart of modern business decision-making, and your exams will test whether you understand when to apply each model—not just what each model does. The models in this guide represent the core toolkit for forecasting sales, classifying customers, detecting fraud, and optimizing operations. You'll encounter questions that ask you to select the right model for a given business scenario, interpret model outputs, and explain trade-offs between accuracy, interpretability, and computational cost.

These models fall into distinct categories based on their underlying mechanisms: regression for continuous outcomes, classification for categorical predictions, ensemble methods for improved accuracy, and unsupervised learning for pattern discovery. Don't just memorize definitions—know what problem type each model solves, what assumptions it makes, and when it outperforms alternatives. That conceptual understanding is what separates strong exam performance from mediocre recall.


Regression Models: Predicting Continuous & Categorical Outcomes

Regression models form the foundation of predictive analytics by establishing mathematical relationships between input variables and outcomes. The key distinction is whether you're predicting a number (linear regression) or a category (logistic regression).

Linear Regression

  • Models continuous outcomes using a linear equation—predicts values like sales revenue, prices, or demand quantities based on one or more independent variables
  • Coefficient interpretation tells you the strength and direction of each predictor's impact; a coefficient of 2.5 means the outcome increases by 2.5 units for each unit increase in that variable
  • Assumes linearity and is sensitive to outliers—always check residual plots and consider transformations when relationships aren't truly linear

Logistic Regression

  • Classifies binary outcomes (yes/no, churn/retain)—outputs a probability between 0 and 1 using the logistic function P(Y=1)=11+ezP(Y=1) = \frac{1}{1 + e^{-z}}
  • Probability threshold determines classification—typically 0.5, but you can adjust based on business costs of false positives vs. false negatives
  • Widely used for customer churn prediction and marketing response modeling—interpretable coefficients make it easy to explain which factors drive outcomes

Compare: Linear Regression vs. Logistic Regression—both model relationships between variables, but linear regression predicts continuous values while logistic regression predicts category probabilities. If an exam question involves predicting "how much" use linear; if it's "which group," use logistic.


Tree-Based Models: Intuitive Decision Logic

Tree-based models split data into branches based on feature values, creating rule-based predictions that mirror human decision-making. Their visual, flowchart structure makes them exceptionally interpretable for business stakeholders.

Decision Trees

  • Splits data hierarchically based on feature thresholds—creates if-then rules like "if income > $50K AND age < 35, then high purchase probability"
  • Handles both categorical and continuous variables without requiring extensive preprocessing or normalization
  • Prone to overfitting but can be pruned to improve generalization; watch for trees that memorize training data rather than learning patterns

Random Forests

  • Ensemble method combining hundreds of decision trees—each tree votes, and the majority or average determines the final prediction
  • Reduces overfitting through averaging and introduces randomness by training each tree on a bootstrap sample with random feature subsets
  • Provides feature importance scores—critical for identifying which variables drive predictions in credit scoring, fraud detection, and customer analytics

Compare: Decision Trees vs. Random Forests—single trees are highly interpretable but overfit easily; random forests sacrifice some interpretability for significantly better accuracy and robustness. Choose decision trees when you need to explain every rule; choose random forests when prediction accuracy matters most.


Advanced Classification Models: Handling Complex Boundaries

When simple linear boundaries won't separate your classes, these models find more sophisticated decision boundaries. They excel when relationships between features and outcomes are non-linear or when you're working in high-dimensional spaces.

Support Vector Machines (SVM)

  • Finds the optimal hyperplane that maximizes the margin between classes—the "street" between data points should be as wide as possible
  • Kernel functions handle non-linear relationships—the kernel trick maps data into higher dimensions where linear separation becomes possible
  • Robust in high-dimensional spaces—performs well even when features outnumber observations, common in text classification and genomics

K-Nearest Neighbors (KNN)

  • Instance-based learning that classifies new points based on the majority class of their kk closest neighbors in feature space
  • No training phase—stores all data and computes distances at prediction time, making it computationally expensive for large datasets
  • Sensitive to the choice of kk and distance metric—small kk values overfit to noise; large kk values oversmooth; Euclidean distance assumes all features are equally important

Naive Bayes

  • Probabilistic classifier using Bayes' theorem—calculates P(classfeatures)P(\text{class}|\text{features}) by assuming all features are independent given the class
  • Surprisingly effective despite the "naive" independence assumption—fast training and prediction make it ideal for real-time applications
  • Excels at text classification including spam detection, sentiment analysis, and document categorization where high-dimensional sparse data is common

Compare: SVM vs. KNN—both handle non-linear classification, but SVM learns a decision boundary during training while KNN makes decisions at prediction time. SVM scales better to large datasets; KNN requires no training but slows down as data grows.


Neural Networks: Learning Complex Patterns

Neural networks use layers of interconnected nodes to learn hierarchical representations of data. Their power comes from automatically discovering relevant features rather than requiring manual feature engineering.

Neural Networks

  • Composed of layers of neurons with weighted connections—input layer receives features, hidden layers transform them, output layer produces predictions
  • Captures highly non-linear relationships through activation functions and deep architectures; excels when you have massive datasets and complex patterns
  • Requires significant computational resources and data—prone to overfitting on small datasets; acts as a "black box" with limited interpretability

Compare: Neural Networks vs. Random Forests—both handle complex non-linear relationships, but random forests provide feature importance and work well with smaller datasets, while neural networks require more data but can learn more sophisticated patterns. For business applications requiring explainability, lean toward random forests.


Time-Dependent Models: Forecasting Sequential Data

Time series models specifically handle data where observations are ordered chronologically and past values influence future outcomes. The key insight is that time introduces dependencies—today's value relates to yesterday's.

Time Series Analysis

  • Analyzes data collected at regular time intervals—techniques like ARIMA, exponential smoothing, and seasonal decomposition capture trends, seasonality, and cycles
  • Decomposes patterns into components: trend (long-term direction), seasonality (repeating patterns), and residual (random noise)
  • Essential for demand forecasting, stock prediction, and inventory management—businesses use these models to anticipate future values based on historical patterns

Unsupervised Learning: Discovering Hidden Structure

Unlike supervised models that predict known outcomes, unsupervised models find patterns in data without labeled examples. They're exploratory tools that reveal structure you didn't know existed.

Clustering Algorithms

  • Groups similar data points without predefined labels—algorithms like K-means, hierarchical clustering, and DBSCAN identify natural groupings based on feature similarity
  • K-means partitions data into kk clusters by minimizing within-cluster variance; you must specify kk in advance, which requires domain knowledge or techniques like the elbow method
  • Drives market segmentation and customer profiling—reveals distinct customer groups that can be targeted with tailored marketing strategies

Compare: Clustering vs. Classification—classification assigns data to known categories using labeled training data; clustering discovers unknown groupings without labels. Use classification when you know what groups exist; use clustering when you're exploring what groups might exist.


Quick Reference Table

ConceptBest Examples
Predicting continuous valuesLinear Regression, Neural Networks, Time Series
Binary classificationLogistic Regression, SVM, Naive Bayes
Multi-class classificationDecision Trees, Random Forests, KNN, Neural Networks
High interpretabilityLinear Regression, Logistic Regression, Decision Trees
Handling non-linear relationshipsSVM (with kernels), Neural Networks, Random Forests
Large dataset performanceRandom Forests, Neural Networks, Naive Bayes
Discovering hidden patternsClustering Algorithms
Sequential/temporal dataTime Series Analysis

Self-Check Questions

  1. A marketing team wants to predict which customers will respond to a campaign (yes/no). Which two models would be most appropriate, and what trade-off exists between them in terms of interpretability?

  2. Compare and contrast Random Forests and Decision Trees: what problem does Random Forests solve that single Decision Trees struggle with, and what do you sacrifice by using the ensemble approach?

  3. You have a dataset with 10,000 features but only 500 observations. Which model is specifically designed to handle this high-dimensional scenario, and why does it work well here?

  4. A retail company needs to identify distinct customer segments for targeted marketing but has no predefined categories. Which type of model should they use, and how does this differ from classification?

  5. An FRQ asks you to recommend a model for predicting next quarter's sales based on five years of monthly data. Which model category is designed for this problem, and what three components would you expect to identify in the data?