Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Clustering sits at the heart of unsupervised learning—you're being tested on your ability to choose the right algorithm for a given data structure and justify that choice mathematically. The methods you'll encounter differ fundamentally in how they define "similarity," whether they assume specific geometric shapes, and how they handle noise. Understanding these distinctions means you can answer questions about convergence guarantees, computational complexity, and parameter sensitivity—all frequent exam topics.
Don't just memorize algorithm names and steps. Know what optimization objective each method minimizes, what assumptions it makes about cluster geometry, and when those assumptions break down. The concepts here—distance metrics, density estimation, probabilistic modeling, and dimensionality reduction—connect directly to broader numerical analysis principles you'll see throughout the course.
These algorithms define clusters by their center points and assign data based on distance to those centers. The core mechanism involves iteratively updating centroid positions to minimize within-cluster variance.
Compare: K-Means vs. Fuzzy C-Means—both minimize distance-based objectives and require specified, but K-means gives hard assignments while FCM provides probabilistic memberships. If an FRQ asks about handling ambiguous cluster boundaries, FCM is your go-to example.
These algorithms identify clusters as regions of high point density separated by low-density areas. The key insight: clusters can have arbitrary shapes as long as dense regions are connected.
Compare: DBSCAN vs. Mean Shift—both find arbitrary-shaped clusters without specifying , but DBSCAN uses explicit density thresholds while Mean Shift uses kernel smoothing. DBSCAN is faster ( with spatial indexing) but struggles with varying densities where OPTICS excels.
These methods model clusters as probability distributions and use statistical inference for assignments. Data points are assumed to be generated from a mixture of underlying distributions.
Compare: K-Means vs. GMM—K-means is actually a special case of GMM with identical spherical covariances and hard assignments. When an exam asks about modeling clusters with different shapes/orientations, GMM is the upgrade path from K-means.
These algorithms build a tree structure (dendrogram) representing nested cluster relationships. No single partition is produced—you choose a level to "cut" the tree.
Compare: Agglomerative vs. K-Means—agglomerative produces a full hierarchy and doesn't need specified, but scales poorly ( memory). K-means scales to large datasets but requires choosing and assumes spherical clusters. Use agglomerative when you need to explore cluster structure at multiple resolutions.
These algorithms leverage similarity graphs and linear algebra to find clusters in complex, non-convex structures. The key idea: transform the data representation before applying simpler clustering.
Compare: Spectral Clustering vs. K-Means—both ultimately use K-means, but spectral clustering first transforms data via eigenvectors of the graph Laplacian. This preprocessing step lets it handle interlocking or non-convex clusters where raw K-means fails completely.
| Concept | Best Examples |
|---|---|
| Centroid-based optimization | K-Means, Fuzzy C-Means |
| Density-based discovery | DBSCAN, OPTICS, Mean Shift |
| Probabilistic modeling | GMM |
| Hierarchical structure | Agglomerative, Divisive |
| Graph-based / manifold | Spectral Clustering, SOM |
| Handles arbitrary shapes | DBSCAN, Mean Shift, Spectral |
| Soft/probabilistic assignments | GMM, Fuzzy C-Means |
| No required | DBSCAN, OPTICS, Mean Shift, Hierarchical |
Which two methods both provide soft cluster assignments, and how do their underlying models differ?
You have a dataset with clusters of varying densities and irregular shapes. Which method would you choose over DBSCAN, and why does it handle this case better?
Compare and contrast K-Means and GMM: what assumption does GMM relax, and what additional parameters must be estimated as a result?
An FRQ asks you to cluster data where you don't know the number of groups in advance. Name three methods that don't require specifying and explain what parameter(s) each requires instead.
Why does spectral clustering outperform K-means on non-convex clusters? Describe the role of the graph Laplacian in transforming the problem.