Fiveable

💻Advanced R Programming Unit 5 Review

QR code for Advanced R Programming practice questions

5.3 Advanced plotting with ggplot2

5.3 Advanced plotting with ggplot2

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
💻Advanced R Programming
Unit & Topic Study Guides

ggplot2 takes data visualization to the next level with its advanced plotting capabilities. Building on the grammar of graphics, it offers a flexible framework for creating complex, multi-layered plots that reveal deeper insights into your data.

From distribution plots to heatmaps, ggplot2 provides a wide array of tools to visualize data in meaningful ways. You'll learn to customize aesthetics, combine multiple layers, and create multi-panel plots, empowering you to tell compelling data stories.

Grammar of Graphics and ggplot2 Structure

Fundamentals of the Grammar of Graphics

  • The grammar of graphics is a framework for creating statistical graphics that separates the components of a plot into layers, scales, and coordinate systems
  • This framework provides a structured and systematic approach to building complex visualizations
  • The grammar of graphics allows for the creation of a wide range of plots by combining different components in a modular fashion
  • Key components of the grammar of graphics include data, aesthetics (visual properties), geometric objects (points, lines, bars), statistical transformations, scales, and coordinate systems

ggplot2 Implementation and Syntax

  • ggplot2 is an implementation of the grammar of graphics in R, providing a powerful and flexible tool for data visualization
  • The basic structure of ggplot2 code includes the ggplot() function, which initializes the plot, followed by layers defined using the + operator
  • Layers in ggplot2 include geom_ functions that specify the type of plot (geom_point(), geom_line(), geom_bar()), aes() for mapping variables to plot aesthetics, and stat_ functions for statistical transformations
  • Scales control the mapping between data values and visual properties, such as color, size, and shape, and are automatically generated or can be customized using scale_ functions
  • Coordinate systems (coord_cartesian(), coord_polar()) define the mapping between data coordinates and the 2D plane of the plot, allowing for transformations like zooming, panning, or polar coordinates

Advanced Plots with ggplot2

Visualizing Distributions

  • Boxplots (geom_boxplot()) display the distribution of a continuous variable, showing the median, interquartile range (IQR), and potential outliers
    • They are useful for comparing the distribution of a variable across different categories or groups
    • Outliers are typically defined as values that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR, where Q1 and Q3 are the first and third quartiles, respectively
  • Violin plots (geom_violin()) combine the information of a boxplot with a kernel density plot, displaying the probability density of the data at different values
    • They provide a more detailed view of the distribution shape compared to boxplots
    • The width of the violin at each point represents the density of observations at that value
  • Density plots (geom_density()) show the distribution of a continuous variable by estimating the probability density function
    • They are a smooth alternative to histograms and can be used to compare the distribution of multiple groups or variables
    • The area under the density curve represents the probability of observing a value within a given range
Fundamentals of the Grammar of Graphics, Data Analysis with R

Heatmaps and Ridgeline Plots

  • Heatmaps (geom_tile() or geom_raster()) are used to visualize 2D data, where the color of each cell represents the value of a variable
    • They are commonly used to display relationships between two variables or to show patterns in large datasets
    • The color scale can be customized using scale_fill_gradient() or scale_fill_viridis_c() to choose appropriate color palettes
  • Ridgeline plots (geom_density_ridges() from the ggridges package) are useful for comparing the distribution of a variable across multiple categories
    • They display the density curves of each category as overlapping ridges, making it easier to compare the shapes and positions of the distributions
    • Ridgeline plots are a space-efficient alternative to faceting when there are many categories to compare

Combining Multiple Layers

  • ggplot2 allows for the creation of complex and informative plots by combining multiple geom_ layers
  • Examples of combining layers include:
    • Adding points (geom_point()) or lines (geom_line()) to a boxplot or violin plot to show individual observations or trends
    • Overlaying a density plot (geom_density()) on top of a histogram (geom_histogram()) to show the distribution shape and individual bins
    • Combining a scatterplot (geom_point()) with a smoothed line (geom_smooth()) to visualize the relationship between two variables and the overall trend

Customizing ggplot2 Aesthetics

Built-in Themes and Theme Customization

  • ggplot2 provides a wide range of functions to customize plot aesthetics, including colors, fonts, axis labels, and legends
  • Theme functions, such as theme_bw(), theme_minimal(), and theme_classic(), control the overall appearance of the plot, including background color, gridlines, and axis formatting
    • These built-in themes provide a quick way to change the look and feel of a plot
    • For example, theme_bw() creates a plot with a white background and gray gridlines, while theme_minimal() removes most of the background elements for a cleaner look
  • The theme() function allows for fine-grained control over individual plot elements, such as axis titles, legend positions, and plot margins
    • Each plot element can be customized by providing the appropriate argument within theme(), such as axis.title, legend.position, or plot.margin
    • For example, theme(axis.title = element_text(size = 14, face = "bold")) sets the font size and weight of the axis titles
Fundamentals of the Grammar of Graphics, R Plotting Systems

Scales and Color Palettes

  • Scales in ggplot2 control the mapping between data values and visual properties, such as colors, sizes, and shapes
  • Scales can be customized using scale_ functions, which allow for fine-tuning of the appearance of plot elements
    • scale_color_manual() and scale_fill_manual() are used to set specific colors for discrete variables
    • scale_color_gradient() and scale_fill_gradient() create color gradients for continuous variables
    • scale_x_continuous() and scale_y_continuous() control the appearance of the x and y axes, including breaks, labels, and limits
  • ggplot2 also provides several built-in color palettes that can be used to create visually appealing plots
    • scale_color_brewer() and scale_fill_brewer() use color palettes from the ColorBrewer library, which are designed for different types of data and color-vision deficiencies
    • scale_color_viridis_c() and scale_fill_viridis_c() use the viridis color palette, which is perceptually uniform and colorblind-friendly

Extensions and Add-on Packages

  • Several extensions and add-on packages are available for ggplot2, providing additional functionality and pre-defined themes for enhancing plot aesthetics
  • The ggthemes package offers a collection of pre-defined themes that mimic the styles of various publications and visualization tools, such as theme_economist() or theme_fivethirtyeight()
  • The ggplot2 extensions package (ggplot2.ext) includes additional geoms, scales, and themes that extend the capabilities of ggplot2, such as geom_split_violin() for split violin plots or scale_color_material() for material design colors
  • The gganimate package allows for the creation of animated plots by specifying how plot elements should change over time or across different categories, using functions like transition_states() or transition_reveal()

Multi-panel Plots with ggplot2

Faceting with facet_wrap() and facet_grid()

  • Multi-panel plots (facets) are used to display subsets of data in separate panels based on one or more categorical variables
  • The facet_wrap() function creates a grid of panels based on a single categorical variable, wrapping the panels into multiple rows if necessary
    • The formula argument in facet_wrap() is used to specify the variable to facet by, such as ~ variable for a single variable or ~ variable1 + variable2 for the interaction of two variables
    • The nrow and ncol arguments control the number of rows and columns in the facet grid
  • facet_grid() creates a grid of panels based on two categorical variables, with one variable represented by rows and the other by columns
    • The formula argument in facet_grid() is used to specify the row and column variables, separated by a tilde (~), such as variable1 ~ variable2
    • If a dot (.) is used instead of a variable name on either side of the tilde, the facet grid will only split the panels by the specified variable

Customizing Facet Appearance

  • Faceting allows for the comparison of patterns and relationships across different subgroups or categories within the data
  • The scales argument in facet_ functions controls whether the scales are fixed (scales = "fixed") or allowed to vary independently (scales = "free") across the panels
    • scales = "free_x" and scales = "free_y" allow the x and y scales to vary independently, respectively
    • By default, the scales are fixed, meaning that the axis limits and breaks are the same across all panels
  • Customization of facet labels, strip backgrounds, and spacing can be done using the labeller, strip.background, and panel.spacing arguments within facet_ functions
    • The labeller argument accepts functions that modify the facet labels, such as label_both for displaying both variable name and value, or custom labeller functions
    • strip.background and panel.spacing control the appearance of the facet strip (the area containing the facet labels) and the spacing between panels, respectively

Example Use Cases for Faceting

  • Faceting is particularly useful when exploring relationships between variables across different subgroups or categories
  • Some common use cases for faceting include:
    • Comparing the distribution of a variable (e.g., income) across different levels of another variable (e.g., education level) using facet_wrap(~ education)
    • Examining the relationship between two variables (e.g., height and weight) for different subgroups (e.g., gender) using facet_grid(gender ~ .)
    • Visualizing time series data for multiple entities (e.g., stock prices for different companies) using facet_wrap(~ company, nrow = 2)
    • Displaying geographic data (e.g., unemployment rates) for different regions or countries using facet_grid(rows = vars(region), cols = vars(year))
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →