🤝Collaborative Data Science

Dimensionality Reduction Algorithms

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

When you're working with datasets containing hundreds or thousands of features, you're facing the curse of dimensionality—a phenomenon where algorithms struggle, visualizations become meaningless, and computational costs explode. Dimensionality reduction isn't just a convenience; it's often the difference between a model that works and one that fails. These techniques connect directly to core concepts you'll be tested on: variance preservation, manifold learning, linear vs. non-linear transformations, and the tradeoff between local and global structure.

Understanding these algorithms means knowing when and why to apply each one—not just how they work mechanically. Are you trying to visualize clusters? Preserve distances for downstream classification? Compress data for storage? Each scenario calls for a different approach. Don't just memorize algorithm names—know what mathematical principle each method optimizes and what structure it preserves.

Linear Methods: Variance and Matrix Factorization

These foundational techniques assume your data lies in or near a linear subspace. They're computationally efficient, mathematically elegant, and form the basis for many advanced methods. The core idea: find directions or components that capture the most important information while discarding noise.

Principal Component Analysis (PCA)

Maximizes variance along orthogonal axes—transforms data into uncorrelated principal components ranked by how much variance each captures
Unsupervised and deterministic, meaning it finds the same solution every time and doesn't require labels
Foundation for preprocessing pipelines—use it for noise reduction, multicollinearity removal, and as input to other algorithms

Singular Value Decomposition (SVD)

Factorizes any matrix into three components: $A = U\Sigma V^T$ , where $\Sigma$ contains singular values revealing data's intrinsic dimensionality
Mathematically equivalent to PCA when applied to centered data, but more numerically stable for large matrices
Powers recommender systems and NLP—latent semantic analysis and collaborative filtering rely heavily on truncated SVD

Linear Discriminant Analysis (LDA)

Supervised method that maximizes class separability—finds projections that push class means apart while keeping within-class variance tight
Optimizes Fisher's criterion: the ratio of between-class to within-class variance, $\frac{\sigma_{between}^2}{\sigma_{within}^2}$
Limited to $c-1$ dimensions where $c$ is the number of classes—a key constraint to remember for exam questions

Compare: PCA vs. LDA—both find linear projections, but PCA maximizes total variance (unsupervised) while LDA maximizes class discrimination (supervised). If an FRQ asks about dimensionality reduction before classification, discuss whether labels are available.

Factor Analysis

Models observed variables as linear combinations of latent factors—assumes hidden variables explain correlations in your data
Includes an error term for each variable, unlike PCA which assumes all variance is signal
Interpretability-focused—commonly used in psychology and social sciences to identify underlying constructs like "intelligence" or "satisfaction"

Compare: PCA vs. Factor Analysis—PCA finds components that explain variance; Factor Analysis finds latent factors that explain correlations. Factor Analysis is better when you believe hidden variables generate your observations.

Manifold Learning: Preserving Geometric Structure

Real-world data often lies on curved, non-linear surfaces (manifolds) embedded in high-dimensional space. These methods unroll the manifold to reveal its true low-dimensional structure. The key insight: Euclidean distance in the original space may not reflect the actual relationships between points.

Isomap

Replaces Euclidean distances with geodesic distances—measures along the manifold surface rather than through ambient space
Builds a neighborhood graph then computes shortest paths, applying classical MDS to the resulting distance matrix
Preserves global structure but sensitive to noise and holes in the data—requires careful parameter tuning for $k$ -nearest neighbors

Locally Linear Embedding (LLE)

Reconstructs each point as a weighted combination of its neighbors—then finds low-dimensional coordinates preserving those same weights
Focuses on local geometry by assuming small neighborhoods are approximately linear patches of the manifold
Computationally efficient but struggles with non-uniform sampling and doesn't handle out-of-sample points naturally

Multidimensional Scaling (MDS)

Preserves pairwise distances in the embedding—finds coordinates where distances match the original dissimilarity matrix as closely as possible
Classical MDS uses eigendecomposition and is equivalent to PCA when distances are Euclidean; non-metric MDS preserves only rank ordering
Flexible input format—can work directly from a distance matrix without needing the original features

Compare: Isomap vs. LLE—both are manifold methods, but Isomap preserves global geodesic distances while LLE preserves local linear reconstructions. Isomap better maintains overall shape; LLE better captures local neighborhoods.

Probabilistic and Neighbor-Based Embeddings

These methods focus on preserving relationships between points rather than distances or variances. They excel at visualization and cluster discovery, using probability distributions or topological structures to define similarity. Particularly powerful for revealing hidden groupings in complex data.

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Converts distances to probabilities—uses Gaussian distributions in high-D and heavy-tailed Student's t-distribution in low-D to model point similarities
Emphasizes local structure by penalizing nearby points that end up far apart more than distant points that stay distant
Non-deterministic and non-parametric—different runs produce different results, and you can't easily embed new points without rerunning

Uniform Manifold Approximation and Projection (UMAP)

Balances local and global structure—constructs a fuzzy topological representation and optimizes cross-entropy between high-D and low-D graphs
Faster than t-SNE with better preservation of continuous structures and cluster relationships
Theoretically grounded in Riemannian geometry and algebraic topology, providing more principled hyperparameter choices

Compare: t-SNE vs. UMAP—both excel at visualization, but UMAP is faster, preserves more global structure, and handles larger datasets better. t-SNE often produces tighter, more separated clusters; UMAP maintains more continuous relationships. For exploratory analysis, try both.

Neural Network Approaches: Learning Representations

Deep learning offers flexible, powerful dimensionality reduction that can capture arbitrarily complex patterns. The tradeoff: more expressive power requires more data and computational resources.

Autoencoders

Encoder-decoder architecture compresses input to a bottleneck layer (latent space) then reconstructs the original—the bottleneck forces learning of efficient representations
Loss function drives compression—typically minimizes reconstruction error like $\text{MSE} = \frac{1}{n}\sum(x - \hat{x})^2$
Variants unlock different capabilities—variational autoencoders (VAEs) learn probabilistic latent spaces; denoising autoencoders improve robustness

Compare: PCA vs. Autoencoders—a single-layer linear autoencoder learns the same subspace as PCA. Deep autoencoders with non-linear activations can capture complex manifolds that PCA misses entirely. Use PCA for interpretability; autoencoders for expressiveness.

Quick Reference Table

Concept	Best Examples
Linear projection (unsupervised)	PCA, SVD
Linear projection (supervised)	LDA
Latent variable modeling	Factor Analysis, Autoencoders
Global structure preservation	Isomap, MDS, UMAP
Local structure preservation	LLE, t-SNE
Visualization of clusters	t-SNE, UMAP
Non-linear manifold learning	Isomap, LLE, Autoencoders
Scalability to large datasets	PCA, SVD, UMAP

Self-Check Questions

You have labeled data and want to reduce dimensions before classification. Which two methods could you use, and how do they differ in their optimization objectives?
A colleague's t-SNE plot shows well-separated clusters, but they conclude the clusters are equally distant from each other. What's wrong with this interpretation, and which alternative method might give more reliable global distance information?
Compare PCA and Factor Analysis: if your goal is to identify interpretable latent constructs (like "customer satisfaction"), which would you choose and why?
You're working with data that lies on a curved surface (like points sampled from a Swiss roll). Why would PCA fail here, and which manifold learning method would you try first?
An FRQ asks you to design a dimensionality reduction pipeline for a dataset with 10,000 samples and 500 features, where you need both visualization and input features for a downstream classifier. Describe your approach using at least two different algorithms and justify your choices.

🤝Collaborative Data Science

Dimensionality Reduction Algorithms

Why This Matters

Linear Methods: Variance and Matrix Factorization

Principal Component Analysis (PCA)

Singular Value Decomposition (SVD)

Linear Discriminant Analysis (LDA)

Factor Analysis

Manifold Learning: Preserving Geometric Structure

Isomap

Locally Linear Embedding (LLE)

Multidimensional Scaling (MDS)

Probabilistic and Neighbor-Based Embeddings

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Uniform Manifold Approximation and Projection (UMAP)

Neural Network Approaches: Learning Representations

Autoencoders

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes