upgrade
upgrade

💻Intro to Programming in R

Key R Data Structures

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Every operation you perform in R—whether it's cleaning messy datasets, running statistical models, or creating visualizations—depends on understanding how data is stored and accessed. You're being tested not just on what these structures are, but on when to use each one and how they behave differently. The difference between a vector and a list, or between a matrix and a data frame, determines whether your code runs smoothly or throws cryptic errors.

These structures fall into patterns based on two key questions: Does it hold one data type or many? and How many dimensions does it have? Master these distinctions, and you'll know exactly which structure to reach for in any situation. Don't just memorize definitions—know what problem each structure solves and how to access its elements.


Homogeneous Structures: One Data Type Only

These structures enforce type consistency—every element must be the same type (numeric, character, logical, etc.). This constraint enables faster computations and predictable behavior.

Vectors

  • The fundamental building block of R—nearly everything in R is built on vectors, including single values (which are just vectors of length 1)
  • Created with c() to combine elements; supports numeric, character, logical, integer, and complex types
  • Element-wise operations allow you to perform calculations on entire vectors without writing loops—a core R programming pattern

Matrices

  • Two-dimensional arrays with rows and columns—created using matrix() with specified nrow and ncol arguments
  • Supports linear algebra operations including matrix multiplication (%*%), transposition (t()), and inversion (solve())
  • Indexed with [row, col] notation—leave one blank to select entire rows or columns (e.g., mat[1, ] gets row 1)

Arrays

  • Multi-dimensional extension of matrices—can have 3, 4, or more dimensions for complex data like RGB images or panel data
  • Created with array() specifying a dim vector that defines the size of each dimension
  • Indexed by position in each dimensionarr[1, 2, 3] accesses row 1, column 2, layer 3

Compare: Vectors vs. Matrices vs. Arrays—all enforce single-type storage, but differ in dimensionality (1D, 2D, nD). If a question asks about storing image pixel data across color channels, arrays are your answer.


Heterogeneous Structures: Mixed Data Types Welcome

These structures can hold different data types simultaneously. This flexibility makes them essential for real-world datasets where you need numbers, text, and categories together.

Lists

  • The most flexible structure in R—can hold vectors, data frames, other lists, even functions as elements
  • Access elements with [[ ]] or $—single brackets [ ] return a sub-list, double brackets extract the actual element
  • Powers functional programming through lapply() and sapply(), which apply functions to each element iteratively

Data Frames

  • R's workhorse for tabular data—each column is a vector (enforcing type within columns), but columns can differ from each other
  • Created with data.frame() or imported from CSV/Excel files; the standard format for statistical analysis in R
  • Compatible with tidyverse functions like dplyr::filter(), select(), and mutate() for powerful data manipulation

Compare: Lists vs. Data Frames—both hold mixed types, but data frames enforce equal-length columns in a rectangular structure. Use lists when elements have different lengths or structures; use data frames for traditional datasets with observations and variables.


Special-Purpose Structures

These structures serve specific analytical needs that general structures can't handle as elegantly.

Factors

  • Designed for categorical data—stores categories as integer codes with associated labels, saving memory for repeated values
  • Created with factor() and essential for statistical modeling—R treats factors differently than character vectors in regression and ANOVA
  • Levels control ordering and grouping—use levels argument to set custom order, critical for correct plot axis ordering and contrast coding

Compare: Factors vs. Character Vectors—both store text-like data, but factors carry level information that affects statistical functions and plotting. Convert to factors when you need R to recognize categories; keep as characters for free-form text.


Quick Reference Table

ConceptBest Examples
Single data type, 1DVectors
Single data type, 2DMatrices
Single data type, 3D+Arrays
Mixed types, flexible structureLists
Mixed types, rectangular/tabularData Frames
Categorical data with levelsFactors
Element-wise operationsVectors, Matrices, Arrays
Statistical modeling inputData Frames, Factors

Self-Check Questions

  1. You need to store a dataset with columns for name (text), age (numeric), and enrolled (TRUE/FALSE). Which structure should you use, and why wouldn't a matrix work?

  2. Compare and contrast how you would access the third element of a vector versus the third element of a list. What's the difference between mylist[3] and mylist[[3]]?

  3. Which two structures are both two-dimensional but differ in their type flexibility? When would you choose one over the other?

  4. A function returns multiple outputs of different lengths—a numeric vector of coefficients, a character string for the model name, and a data frame of residuals. Which structure should you use to store all of these together?

  5. You're building a regression model and have a variable for education level (High School, Bachelor's, Master's, PhD). Should this be stored as a character vector or a factor? What happens differently in the model based on your choice?