upgrade
upgrade

🧮Data Science Numerical Analysis

Clustering Methods

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Clustering sits at the heart of unsupervised learning—you're being tested on your ability to choose the right algorithm for a given data structure and justify that choice mathematically. The methods you'll encounter differ fundamentally in how they define "similarity," whether they assume specific geometric shapes, and how they handle noise. Understanding these distinctions means you can answer questions about convergence guarantees, computational complexity, and parameter sensitivity—all frequent exam topics.

Don't just memorize algorithm names and steps. Know what optimization objective each method minimizes, what assumptions it makes about cluster geometry, and when those assumptions break down. The concepts here—distance metrics, density estimation, probabilistic modeling, and dimensionality reduction—connect directly to broader numerical analysis principles you'll see throughout the course.


Centroid-Based Methods

These algorithms define clusters by their center points and assign data based on distance to those centers. The core mechanism involves iteratively updating centroid positions to minimize within-cluster variance.

K-Means Clustering

  • Minimizes within-cluster sum of squares—the objective function is J=i=1KxCixμi2J = \sum_{i=1}^{K} \sum_{x \in C_i} \|x - \mu_i\|^2 where μi\mu_i is the centroid of cluster CiC_i
  • Lloyd's algorithm iterates between assignment and update steps, converging to a local minimum (not guaranteed to find global optimum)
  • Requires KK specified in advance and assumes spherical clusters of similar size—performance degrades with elongated or uneven clusters

Fuzzy C-Means Clustering

  • Soft clustering via membership degrees—each point xx has membership uij[0,1]u_{ij} \in [0,1] for cluster jj, with juij=1\sum_j u_{ij} = 1
  • Fuzziness parameter mm controls how soft the boundaries are; as m1m \to 1, assignments become hard like K-means
  • Useful when cluster boundaries overlap—common in image segmentation and pattern recognition where crisp assignments are unrealistic

Compare: K-Means vs. Fuzzy C-Means—both minimize distance-based objectives and require KK specified, but K-means gives hard assignments while FCM provides probabilistic memberships. If an FRQ asks about handling ambiguous cluster boundaries, FCM is your go-to example.


Density-Based Methods

These algorithms identify clusters as regions of high point density separated by low-density areas. The key insight: clusters can have arbitrary shapes as long as dense regions are connected.

DBSCAN

  • Two parameters define densityε\varepsilon (neighborhood radius) and minPts\text{minPts} (minimum points to form a core point)
  • Classifies points as core, border, or noise—core points have minPts\geq \text{minPts} neighbors within ε\varepsilon; noise points belong to no cluster
  • No cluster count required and naturally handles outliers—ideal when you expect irregular cluster shapes or contaminated data

OPTICS

  • Produces a reachability plot instead of explicit clusters—visualizes clustering structure across multiple density thresholds simultaneously
  • Handles varying densities that break DBSCAN; the reachability distance r(p)r(p) captures local density information
  • Extract clusters post-hoc by choosing thresholds on the reachability plot—more flexible than committing to single ε\varepsilon value upfront

Mean Shift Clustering

  • Kernel density estimation drives the algorithm—points iteratively shift toward the mode of their local density estimate
  • Bandwidth parameter hh controls smoothing; larger hh merges nearby modes, smaller hh preserves fine structure
  • Converges to local density maxima—no cluster count needed, but computational cost is O(Tn2)O(Tn^2) for TT iterations

Compare: DBSCAN vs. Mean Shift—both find arbitrary-shaped clusters without specifying KK, but DBSCAN uses explicit density thresholds while Mean Shift uses kernel smoothing. DBSCAN is faster (O(nlogn)O(n \log n) with spatial indexing) but struggles with varying densities where OPTICS excels.


Probabilistic Methods

These methods model clusters as probability distributions and use statistical inference for assignments. Data points are assumed to be generated from a mixture of underlying distributions.

Gaussian Mixture Models (GMM)

  • Assumes data comes from KK Gaussian distributions—each cluster has mean μk\mu_k, covariance Σk\Sigma_k, and mixing weight πk\pi_k
  • EM algorithm alternates E and M steps—E-step computes posterior probabilities γik\gamma_{ik}, M-step updates parameters to maximize likelihood
  • Soft assignments with full covariance allow elliptical clusters of different sizes and orientations—more flexible than K-means' spherical assumption

Compare: K-Means vs. GMM—K-means is actually a special case of GMM with identical spherical covariances and hard assignments. When an exam asks about modeling clusters with different shapes/orientations, GMM is the upgrade path from K-means.


Hierarchical Methods

These algorithms build a tree structure (dendrogram) representing nested cluster relationships. No single partition is produced—you choose a level to "cut" the tree.

Agglomerative Clustering

  • Bottom-up merging starts with nn singleton clusters and repeatedly merges the closest pair until one cluster remains
  • Linkage criterion determines "closest"—single linkage (min distance), complete linkage (max), average linkage, or Ward's method (minimum variance)
  • Dendrogram visualization reveals hierarchical structure; cut at different heights to obtain different numbers of clusters

Hierarchical Clustering (General)

  • Divisive (top-down) approach starts with one cluster and recursively splits—less common due to higher computational cost
  • No KK required upfront—the dendrogram lets you explore multiple granularities without re-running the algorithm
  • O(n2)O(n^2) space and O(n3)O(n^3) time for naive implementations—limits scalability to moderate dataset sizes

Compare: Agglomerative vs. K-Means—agglomerative produces a full hierarchy and doesn't need KK specified, but scales poorly (O(n2)O(n^2) memory). K-means scales to large datasets but requires choosing KK and assumes spherical clusters. Use agglomerative when you need to explore cluster structure at multiple resolutions.


Graph and Manifold Methods

These algorithms leverage similarity graphs and linear algebra to find clusters in complex, non-convex structures. The key idea: transform the data representation before applying simpler clustering.

Spectral Clustering

  • Constructs a similarity graph with adjacency matrix WW and computes the graph Laplacian L=DWL = D - W where DD is the degree matrix
  • Eigendecomposition reveals structure—the KK smallest eigenvectors of LL embed data into a space where K-means works well
  • Excels on non-convex clusters that defeat centroid methods—the graph structure captures connectivity rather than just distance

Self-Organizing Maps (SOM)

  • Neural network for dimensionality reduction—maps high-dimensional data onto a 2D grid while preserving topological relationships
  • Competitive learning updates the best matching unit (BMU) and its neighbors; learning rate and neighborhood size decay over training
  • Visualization and clustering combined—the trained map reveals cluster structure through the organization of grid nodes

Compare: Spectral Clustering vs. K-Means—both ultimately use K-means, but spectral clustering first transforms data via eigenvectors of the graph Laplacian. This preprocessing step lets it handle interlocking or non-convex clusters where raw K-means fails completely.


Quick Reference Table

ConceptBest Examples
Centroid-based optimizationK-Means, Fuzzy C-Means
Density-based discoveryDBSCAN, OPTICS, Mean Shift
Probabilistic modelingGMM
Hierarchical structureAgglomerative, Divisive
Graph-based / manifoldSpectral Clustering, SOM
Handles arbitrary shapesDBSCAN, Mean Shift, Spectral
Soft/probabilistic assignmentsGMM, Fuzzy C-Means
No KK requiredDBSCAN, OPTICS, Mean Shift, Hierarchical

Self-Check Questions

  1. Which two methods both provide soft cluster assignments, and how do their underlying models differ?

  2. You have a dataset with clusters of varying densities and irregular shapes. Which method would you choose over DBSCAN, and why does it handle this case better?

  3. Compare and contrast K-Means and GMM: what assumption does GMM relax, and what additional parameters must be estimated as a result?

  4. An FRQ asks you to cluster data where you don't know the number of groups in advance. Name three methods that don't require specifying KK and explain what parameter(s) each requires instead.

  5. Why does spectral clustering outperform K-means on non-convex clusters? Describe the role of the graph Laplacian in transforming the problem.