Fiveable

👩‍💻Foundations of Data Science Unit 11 Review

QR code for Foundations of Data Science practice questions

11.2 t-SNE and UMAP

11.2 t-SNE and UMAP

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
👩‍💻Foundations of Data Science
Unit & Topic Study Guides

Dimensionality reduction techniques help simplify complex datasets while preserving important information. Linear methods like PCA maintain global structure, while non-linear approaches like t-SNE and UMAP capture intricate local relationships, offering more flexibility but with increased computational demands.

t-SNE and UMAP are powerful tools for visualizing high-dimensional data in lower dimensions. These techniques differ in their underlying algorithms and performance characteristics, with UMAP generally offering faster processing and better preservation of global structure compared to t-SNE.

Linear vs Non-Linear Dimensionality Reduction

Linear vs non-linear dimensionality reduction

  • Linear techniques preserve global structure, assume linear feature relationships (PCA, LDA)
  • Non-linear techniques preserve local structure, capture complex relationships (t-SNE, UMAP, Isomap, LLE)
  • Key differences: flexibility in capturing relationships, computational complexity, result interpretability
Linear vs non-linear dimensionality reduction, t-SNE in Python [single cell RNA-seq example and hyperparameter optimization] - Renesh Bedre

t-SNE and UMAP

Linear vs non-linear dimensionality reduction, dimensionality reduction - Relationship between SVD and PCA. How to use SVD to perform PCA ...

Visualization with t-SNE

  • t-SNE converts high-dimensional distances to conditional probabilities
  • Uses Student's t-distribution for low-dimensional similarities
  • Key steps: compute pairwise similarities, initialize embedding, optimize with gradient descent
  • Hyperparameters: perplexity balances local/global structure, learning rate affects convergence
  • Visualize with scatter plots, color-code points by classes or clusters

Concepts and parameters of UMAP

  • Based on topological data analysis and manifold learning
  • Constructs fuzzy topological representation of high-dimensional data
  • Key concepts: Riemannian geometry, metric spaces, simplicial complexes, fuzzy simplicial sets
  • Workflow: construct fuzzy representation, create low-dimensional representation, optimize layout
  • Hyperparameters: neighbors affect structure preservation, minimum distance controls point packing, epochs balance quality and computation time

t-SNE vs UMAP for datasets

  • Both non-linear techniques preserve local structure, visualize high-dimensional data
  • Algorithms differ: t-SNE uses probabilistic approach, UMAP uses manifold learning
  • UMAP faster, better at preserving global structure, more scalable to large datasets
  • UMAP results more stable across runs, t-SNE can vary due to random initialization
  • t-SNE often preferred for single-cell RNA sequencing, UMAP better for datasets with meaningful global structure
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →