upgrade
upgrade

📊Advanced Quantitative Methods

Key Concepts in Machine Learning Models

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Machine learning models form the backbone of modern data analysis, and you're being tested on more than just definitions—you need to understand when to apply each model, why certain models outperform others in specific situations, and how they handle different types of data. The AP exam will ask you to choose appropriate models for given scenarios, interpret their outputs, and recognize their limitations. Think of this as building a toolkit where each algorithm solves a particular type of problem.

The key concepts here revolve around supervised vs. unsupervised learning, classification vs. regression tasks, model complexity vs. interpretability tradeoffs, and ensemble methods. When you encounter a question about predicting categories versus continuous values, or about handling high-dimensional data, you need to immediately connect it to the right model family. Don't just memorize what each algorithm does—know what problem type each one solves and why it's suited for that task.


Regression Models: Predicting Continuous and Categorical Outcomes

These foundational models establish relationships between input features and outputs. Regression models estimate parameters that minimize prediction error, making them interpretable and widely applicable as baseline approaches.

Linear Regression

  • Models linear relationships between independent variables and a continuous dependent variable using the equation y=β0+β1x1+...+βnxny = \beta_0 + \beta_1x_1 + ... + \beta_nx_n
  • Coefficients are directly interpretable—each β\beta represents the change in outcome for a one-unit change in that predictor
  • Highly sensitive to outliers, which can dramatically skew the fitted line and lead to poor predictions on new data

Logistic Regression

  • Designed for binary classification, predicting the probability of an outcome (e.g., spam vs. not spam) rather than a continuous value
  • Uses the logistic function 11+ez\frac{1}{1 + e^{-z}} to constrain outputs between 0 and 1, representing probabilities
  • Threshold-based decisions—typically classifies as positive when probability exceeds 0.5, though this threshold can be adjusted for imbalanced datasets

Compare: Linear Regression vs. Logistic Regression—both model relationships between features and outcomes, but linear regression predicts continuous values while logistic regression predicts probabilities for categories. If an FRQ gives you a yes/no outcome, logistic is your answer.


Tree-Based Models: Splitting Data for Decisions

Tree-based approaches work by recursively partitioning data based on feature values. Each split maximizes information gain or minimizes impurity, creating interpretable decision paths that mirror human reasoning.

Decision Trees

  • Non-parametric and highly interpretable—splits data into branches based on feature thresholds, creating visual flowcharts for predictions
  • Handles both numerical and categorical data without requiring extensive preprocessing or feature scaling
  • Prone to overfitting on training data, especially when trees grow deep; pruning and depth limits help control complexity

Random Forests

  • Ensemble method combining hundreds of decision trees, each trained on random data subsets to reduce overfitting and improve generalization
  • Provides feature importance scores by measuring how much each variable contributes to prediction accuracy across all trees
  • Robust to noise and missing data, making it a reliable choice for both classification and regression when interpretability is less critical

Compare: Decision Trees vs. Random Forests—single trees are interpretable but overfit easily; random forests sacrifice some interpretability for significantly better accuracy and stability. When asked about the bias-variance tradeoff, this is your go-to example.


Distance and Boundary-Based Classifiers

These models classify data by measuring similarity or finding optimal separations in feature space. The geometry of your data determines which approach works best—KNN relies on local neighborhoods while SVM finds global decision boundaries.

K-Nearest Neighbors (KNN)

  • Instance-based learning that classifies new points by majority vote among the K closest training examples—no explicit model is built
  • Requires feature scaling since distance metrics like Euclidean distance are sensitive to variable magnitudes
  • Computationally expensive at prediction time because it must scan the entire dataset; works best with smaller datasets

Support Vector Machines (SVM)

  • Finds the optimal hyperplane that maximizes the margin between classes, making it powerful for clear separation problems
  • Kernel trick enables non-linear boundaries by transforming data into higher dimensions where linear separation becomes possible
  • Excels in high-dimensional spaces, even when features outnumber samples, but requires careful tuning of kernel and regularization parameters

Compare: KNN vs. SVM—both are classification workhorses, but KNN makes local decisions based on neighbors while SVM finds a global boundary. KNN is simple but slow at prediction; SVM is complex to tune but efficient once trained.


Neural Networks: Learning Complex Patterns

Neural networks model intricate, non-linear relationships through layers of interconnected nodes. Each layer transforms inputs through weighted connections and activation functions, enabling the network to learn hierarchical representations.

Neural Networks

  • Layered architecture of neurons where each connection has a learnable weight, allowing the model to capture complex patterns in data
  • Activation functions introduce non-linearity—without them, multiple layers would collapse into a single linear transformation
  • Data and compute hungry, requiring large training sets and significant processing power, especially for deep architectures used in image and language tasks

Probabilistic Models: Classification Through Probability

Probabilistic classifiers use statistical principles to assign class labels. Bayes' theorem provides the mathematical foundation, calculating the probability of each class given the observed features.

Naive Bayes

  • Applies Bayes' theorem with the "naive" assumption that all features are conditionally independent given the class label
  • Exceptionally fast and efficient—requires minimal training data and computes quickly, ideal for real-time applications
  • Dominates text classification tasks like spam filtering and sentiment analysis, performing surprisingly well despite the independence assumption rarely holding true

Compare: Naive Bayes vs. Logistic Regression—both handle classification, but Naive Bayes assumes feature independence and uses probability directly, while logistic regression learns feature weights without independence assumptions. Naive Bayes trains faster; logistic regression often achieves higher accuracy when features interact.


Unsupervised Learning: Finding Structure Without Labels

Unlike supervised methods, these algorithms discover patterns in data without predefined outcomes. The goal is to reveal hidden structure—whether grouping similar items or reducing data complexity.

K-Means Clustering

  • Partitions data into K clusters by iteratively assigning points to the nearest centroid and recalculating centroid positions until convergence
  • Sensitive to initialization and K selection—poor starting centroids or wrong cluster counts lead to suboptimal groupings
  • Best for exploratory analysis when you suspect natural groupings exist but don't have labeled examples to train on

Principal Component Analysis (PCA)

  • Dimensionality reduction technique that projects data onto orthogonal axes (principal components) capturing maximum variance
  • Eliminates redundant features by combining correlated variables, reducing noise and computational burden for downstream models
  • Essential for visualization of high-dimensional data in 2D or 3D, and commonly used as a preprocessing step before classification

Compare: K-Means vs. PCA—both are unsupervised, but K-Means groups similar data points while PCA reduces feature dimensions. K-Means answers "what clusters exist?" while PCA answers "which features matter most?"


Quick Reference Table

ConceptBest Examples
Predicting continuous valuesLinear Regression
Binary/multiclass classificationLogistic Regression, Naive Bayes, SVM, KNN
High interpretabilityDecision Trees, Linear Regression
Ensemble methods (reduce overfitting)Random Forests
High-dimensional dataSVM, PCA
Text classificationNaive Bayes
Complex pattern recognitionNeural Networks
Unsupervised clusteringK-Means
Dimensionality reductionPCA

Self-Check Questions

  1. Which two models are both used for classification but differ in whether they build an explicit model during training? What are the tradeoffs between them?

  2. You're given a dataset with 50 features and only 30 samples. Which model would likely perform best, and why might you apply PCA first?

  3. Compare and contrast Decision Trees and Random Forests in terms of interpretability, overfitting risk, and when you'd choose one over the other.

  4. A company wants to segment customers into groups based on purchasing behavior, but they don't have predefined categories. Which algorithm should they use, and what key parameter must they specify?

  5. An FRQ asks you to recommend a model for predicting whether an email is spam. Identify two appropriate models and explain why each would work, noting any preprocessing considerations.