📊Principles of Data Science

Common Data Visualization Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Visualization isn't just about making pretty pictures—it's about choosing the right tool to reveal what your data is actually telling you. In this course, you're being tested on your ability to match visualization types to data characteristics: distribution shape, variable relationships, categorical comparisons, and temporal patterns. The difference between a histogram and a bar chart isn't just aesthetic; it reflects fundamentally different questions about your data.

When you encounter a visualization problem on an exam or in an FRQ, you need to think beyond "what looks nice" to "what does this chart communicate?" Each technique in this guide exists because it answers a specific analytical question. Don't just memorize which chart does what—know why each visualization reveals certain patterns and when it fails. That conceptual understanding is what separates strong answers from mediocre ones.

Visualizing Distributions

Understanding how a single variable is distributed—its shape, spread, and central tendency—is foundational to exploratory data analysis. These visualizations answer the question: "What does this variable look like across all observations?"

Histograms

Bins continuous data into intervals—the height of each bar shows frequency or density within that range
Reveals distribution shape including normality, skewness, and modality (unimodal vs. bimodal)
Bin width matters—too few bins obscure patterns, too many create noise; this is a common exam topic

Box Plots

Summarizes five-number summary visually—minimum, $Q_1$ , median, $Q_3$ , and maximum at a glance
Outliers displayed explicitly as individual points beyond the whiskers (typically $1.5 \times IQR$ )
Ideal for comparing distributions across multiple groups side-by-side without overlapping density curves

Compare: Histograms vs. Box Plots—both show distribution, but histograms reveal shape (bimodality, gaps) while box plots excel at cross-group comparison and outlier detection. If an FRQ asks you to compare distributions across categories, box plots are usually your best choice.

Exploring Relationships Between Variables

These visualizations help you understand how two or more variables relate to each other. They answer: "Does changing X predict changes in Y?"

Scatter Plots

Plots two continuous variables with each point representing one observation's $(x, y)$ pair
Reveals correlation direction and strength—look for linear patterns, curves, or no relationship
Exposes clusters and outliers that might indicate subgroups or data quality issues

Bubble Charts

Extends scatter plots to three dimensions— $x$ -position, $y$ -position, and bubble size encode three variables
Size represents magnitude of a third quantitative variable, adding richness without adding axes
Requires careful interpretation—human perception of area is imprecise, so use for general patterns rather than exact comparisons

Heatmaps

Encodes values as color intensity in a matrix format, often showing correlations between variable pairs
Correlation matrices are a classic use case—quickly spot which variables move together
Scales to high dimensions where scatter plot matrices would become unwieldy

Compare: Scatter Plots vs. Heatmaps—scatter plots show the actual relationship between two specific variables (including outliers and nonlinearity), while heatmaps summarize many pairwise relationships at once. Use scatter plots for deep dives, heatmaps for overview.

Tracking Change Over Time

Time series data requires visualizations that emphasize temporal flow and trends. The key principle: time almost always belongs on the x-axis, and connected points imply continuity.

Line Graphs

Connects sequential data points to emphasize trends, cycles, and rate of change over time
Supports multiple series comparison—overlay several lines to compare trajectories
Implies continuity—only appropriate when interpolation between points makes sense

Area Charts

Fills the region below the line—emphasizes cumulative magnitude rather than just position
Stacked area charts show part-to-whole relationships over time (how components contribute to total)
Can obscure individual series when stacked—use transparency or consider alternatives for precise comparisons

Compare: Line Graphs vs. Area Charts—both show temporal trends, but area charts emphasize magnitude and cumulative totals while line graphs prioritize precise value reading and series comparison. Choose line graphs when exact values matter; area charts when you want to show "how much."

Comparing Categories

When your data involves discrete groups rather than continuous measurements, you need visualizations designed for categorical comparisons. These answer: "How do groups differ?"

Bar Charts

Height or length encodes quantity for each discrete category—the go-to for categorical comparison
Orientation matters—horizontal bars work better for long category labels or many categories
Grouped and stacked variants allow comparison across multiple categorical variables simultaneously

Pie Charts

Shows parts of a whole as angular slices—each slice's angle represents its proportion
Limited to few categories—more than 5-6 slices becomes difficult to interpret accurately
Human perception weakness—we're bad at comparing angles; bar charts are usually more precise

Compare: Bar Charts vs. Pie Charts—both show categorical proportions, but bar charts allow precise comparison (we read length better than angle) while pie charts emphasize the part-to-whole relationship. Most data scientists prefer bar charts; use pie charts sparingly and only when the "totals to 100%" message is central.

Showing Hierarchical and Part-to-Whole Relationships

Some data has nested structure or needs to show how components contribute to totals. These visualizations reveal composition and hierarchy.

Treemaps

Nested rectangles represent hierarchical data—size indicates magnitude at each level
Space-efficient for showing many categories and subcategories simultaneously
Color can encode additional dimension—often used to show performance (red/green) within categories

Compare: Treemaps vs. Pie Charts—both show part-to-whole relationships, but treemaps handle hierarchical data and many more categories effectively. Treemaps sacrifice the intuitive "totals to 100%" framing but gain scalability and can show nested structure.

Quick Reference Table

Concept	Best Examples
Single variable distribution	Histogram, Box Plot
Two-variable relationships	Scatter Plot, Heatmap
Three-variable relationships	Bubble Chart, Heatmap with annotations
Temporal trends	Line Graph, Area Chart
Categorical comparison	Bar Chart, Grouped Bar Chart
Part-to-whole (few categories)	Pie Chart, Stacked Bar Chart
Hierarchical structure	Treemap
Outlier detection	Box Plot, Scatter Plot
Correlation overview	Heatmap (correlation matrix)

Self-Check Questions

You have a dataset with 500 observations of a continuous variable and want to check if it's normally distributed. Which visualization would you choose, and what specific features would indicate normality?
Compare and contrast when you would use a scatter plot versus a heatmap to explore relationships between variables. What does each reveal that the other might miss?
A colleague uses a pie chart with 12 slices to show market share data. What's problematic about this choice, and what alternative would you recommend?
You need to compare the distribution of test scores across five different sections of a course. Which visualization allows the clearest comparison, and what specific features would you examine?
An FRQ asks you to visualize how three variables relate to each other—two continuous and one that indicates magnitude. Which technique encodes all three, and what's its main limitation?

📊Principles of Data Science

Common Data Visualization Techniques

Why This Matters

Visualizing Distributions

Histograms

Box Plots

Exploring Relationships Between Variables

Scatter Plots

Bubble Charts

Heatmaps

Tracking Change Over Time

Line Graphs

Area Charts

Comparing Categories

Bar Charts

Pie Charts

Showing Hierarchical and Part-to-Whole Relationships

Treemaps

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes