Fiveable

📉Statistical Methods for Data Science Unit 12 Review

QR code for Statistical Methods for Data Science practice questions

12.3 Density-based Clustering

12.3 Density-based Clustering

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📉Statistical Methods for Data Science
Unit & Topic Study Guides

Density-based clustering finds groups of closely packed points, separating them from low-density areas. DBSCAN, a popular algorithm, doesn't need you to specify the number of clusters beforehand and can find clusters of any shape.

DBSCAN uses two key parameters: Eps (maximum neighbor distance) and MinPts (minimum points for a dense region). It identifies core, border, and noise points, forming clusters based on density-reachability and connectivity between points.

DBSCAN Algorithm

Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

  • DBSCAN is a density-based clustering algorithm that groups together points that are closely packed together and marks points as outliers that lie in low-density regions
  • Clusters are defined as areas of high density separated by areas of low density
  • Does not require specifying the number of clusters in advance like k-means clustering
  • Can discover clusters of arbitrary shape and is not limited to discovering spherical clusters

Key Parameters and Concepts

  • Eps (epsilon) is the maximum radius of the neighborhood around a point xx
    • Determines the maximum distance between two points for them to be considered neighbors
  • MinPts is the minimum number of points required to form a dense region
    • Determines the minimum number of points required in the Eps neighborhood of a point for it to be considered a core point
  • Core point is a point that has at least MinPts points within its Eps neighborhood, including itself
  • Border point is a point that has fewer than MinPts points within its Eps neighborhood, but is reachable from some core point
  • Noise point is any point that is neither a core point nor a border point
    • Noise points do not belong to any cluster and are treated as outliers
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Clustering in Machine Learning

Density Concepts

Reachability and Connectivity

  • Directly density-reachable: A point pp is directly density-reachable from a point qq if pp is within the Eps neighborhood of qq and qq is a core point
  • Density-reachable: A point pp is density-reachable from a point qq if there is a chain of points p1,,pnp_1, \dots, p_n, where p1=qp_1 = q, pn=pp_n = p such that pi+1p_{i+1} is directly density-reachable from pip_i for all ii
    • Density-reachability is not symmetric (e.g., a border point may be density-reachable from a core point, but not vice versa)
  • Density-connected: Two points pp and qq are density-connected if there is a point oo such that both pp and qq are density-reachable from oo
    • Density-connectivity is symmetric (e.g., if pp is density-connected to qq, then qq is also density-connected to pp)
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Hands-on: Clustering in Machine Learning / Statistics and machine learning

Cluster Formation

  • A cluster in DBSCAN is defined as a set of density-connected points that is maximal with respect to density-reachability
  • Points within a cluster are mutually density-connected and density-reachable from at least one core point in the cluster
  • Different clusters are separated by regions of low density, where points are not density-reachable from any core point in the clusters

Extensions

Ordering Points to Identify the Clustering Structure (OPTICS)

  • OPTICS is an extension of DBSCAN that creates a reachability plot, which is a visual representation of the clustering structure
  • Reachability plot displays the reachability distance for each point, which represents the minimum distance for a point to be density-reachable from a higher density point
  • Clusters appear as valleys or dents in the reachability plot, with deeper valleys indicating denser clusters
  • Provides a hierarchical clustering structure without explicitly producing a data set clustering
  • Allows for extracting clusters at different density levels (e.g., by varying the Eps parameter) from the reachability plot
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →