upgrade
upgrade

📊Principles of Data Science

Common Data Visualization Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Visualization isn't just about making pretty pictures—it's about choosing the right tool to reveal what your data is actually telling you. In this course, you're being tested on your ability to match visualization types to data characteristics: distribution shape, variable relationships, categorical comparisons, and temporal patterns. The difference between a histogram and a bar chart isn't just aesthetic; it reflects fundamentally different questions about your data.

When you encounter a visualization problem on an exam or in an FRQ, you need to think beyond "what looks nice" to "what does this chart communicate?" Each technique in this guide exists because it answers a specific analytical question. Don't just memorize which chart does what—know why each visualization reveals certain patterns and when it fails. That conceptual understanding is what separates strong answers from mediocre ones.


Visualizing Distributions

Understanding how a single variable is distributed—its shape, spread, and central tendency—is foundational to exploratory data analysis. These visualizations answer the question: "What does this variable look like across all observations?"

Histograms

  • Bins continuous data into intervals—the height of each bar shows frequency or density within that range
  • Reveals distribution shape including normality, skewness, and modality (unimodal vs. bimodal)
  • Bin width matters—too few bins obscure patterns, too many create noise; this is a common exam topic

Box Plots

  • Summarizes five-number summary visually—minimum, Q1Q_1, median, Q3Q_3, and maximum at a glance
  • Outliers displayed explicitly as individual points beyond the whiskers (typically 1.5×IQR1.5 \times IQR)
  • Ideal for comparing distributions across multiple groups side-by-side without overlapping density curves

Compare: Histograms vs. Box Plots—both show distribution, but histograms reveal shape (bimodality, gaps) while box plots excel at cross-group comparison and outlier detection. If an FRQ asks you to compare distributions across categories, box plots are usually your best choice.


Exploring Relationships Between Variables

These visualizations help you understand how two or more variables relate to each other. They answer: "Does changing X predict changes in Y?"

Scatter Plots

  • Plots two continuous variables with each point representing one observation's (x,y)(x, y) pair
  • Reveals correlation direction and strength—look for linear patterns, curves, or no relationship
  • Exposes clusters and outliers that might indicate subgroups or data quality issues

Bubble Charts

  • Extends scatter plots to three dimensionsxx-position, yy-position, and bubble size encode three variables
  • Size represents magnitude of a third quantitative variable, adding richness without adding axes
  • Requires careful interpretation—human perception of area is imprecise, so use for general patterns rather than exact comparisons

Heatmaps

  • Encodes values as color intensity in a matrix format, often showing correlations between variable pairs
  • Correlation matrices are a classic use case—quickly spot which variables move together
  • Scales to high dimensions where scatter plot matrices would become unwieldy

Compare: Scatter Plots vs. Heatmaps—scatter plots show the actual relationship between two specific variables (including outliers and nonlinearity), while heatmaps summarize many pairwise relationships at once. Use scatter plots for deep dives, heatmaps for overview.


Tracking Change Over Time

Time series data requires visualizations that emphasize temporal flow and trends. The key principle: time almost always belongs on the x-axis, and connected points imply continuity.

Line Graphs

  • Connects sequential data points to emphasize trends, cycles, and rate of change over time
  • Supports multiple series comparison—overlay several lines to compare trajectories
  • Implies continuity—only appropriate when interpolation between points makes sense

Area Charts

  • Fills the region below the line—emphasizes cumulative magnitude rather than just position
  • Stacked area charts show part-to-whole relationships over time (how components contribute to total)
  • Can obscure individual series when stacked—use transparency or consider alternatives for precise comparisons

Compare: Line Graphs vs. Area Charts—both show temporal trends, but area charts emphasize magnitude and cumulative totals while line graphs prioritize precise value reading and series comparison. Choose line graphs when exact values matter; area charts when you want to show "how much."


Comparing Categories

When your data involves discrete groups rather than continuous measurements, you need visualizations designed for categorical comparisons. These answer: "How do groups differ?"

Bar Charts

  • Height or length encodes quantity for each discrete category—the go-to for categorical comparison
  • Orientation matters—horizontal bars work better for long category labels or many categories
  • Grouped and stacked variants allow comparison across multiple categorical variables simultaneously

Pie Charts

  • Shows parts of a whole as angular slices—each slice's angle represents its proportion
  • Limited to few categories—more than 5-6 slices becomes difficult to interpret accurately
  • Human perception weakness—we're bad at comparing angles; bar charts are usually more precise

Compare: Bar Charts vs. Pie Charts—both show categorical proportions, but bar charts allow precise comparison (we read length better than angle) while pie charts emphasize the part-to-whole relationship. Most data scientists prefer bar charts; use pie charts sparingly and only when the "totals to 100%" message is central.


Showing Hierarchical and Part-to-Whole Relationships

Some data has nested structure or needs to show how components contribute to totals. These visualizations reveal composition and hierarchy.

Treemaps

  • Nested rectangles represent hierarchical data—size indicates magnitude at each level
  • Space-efficient for showing many categories and subcategories simultaneously
  • Color can encode additional dimension—often used to show performance (red/green) within categories

Compare: Treemaps vs. Pie Charts—both show part-to-whole relationships, but treemaps handle hierarchical data and many more categories effectively. Treemaps sacrifice the intuitive "totals to 100%" framing but gain scalability and can show nested structure.


Quick Reference Table

ConceptBest Examples
Single variable distributionHistogram, Box Plot
Two-variable relationshipsScatter Plot, Heatmap
Three-variable relationshipsBubble Chart, Heatmap with annotations
Temporal trendsLine Graph, Area Chart
Categorical comparisonBar Chart, Grouped Bar Chart
Part-to-whole (few categories)Pie Chart, Stacked Bar Chart
Hierarchical structureTreemap
Outlier detectionBox Plot, Scatter Plot
Correlation overviewHeatmap (correlation matrix)

Self-Check Questions

  1. You have a dataset with 500 observations of a continuous variable and want to check if it's normally distributed. Which visualization would you choose, and what specific features would indicate normality?

  2. Compare and contrast when you would use a scatter plot versus a heatmap to explore relationships between variables. What does each reveal that the other might miss?

  3. A colleague uses a pie chart with 12 slices to show market share data. What's problematic about this choice, and what alternative would you recommend?

  4. You need to compare the distribution of test scores across five different sections of a course. Which visualization allows the clearest comparison, and what specific features would you examine?

  5. An FRQ asks you to visualize how three variables relate to each other—two continuous and one that indicates magnitude. Which technique encodes all three, and what's its main limitation?