Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Every operation you perform in R—whether it's cleaning messy datasets, running statistical models, or creating visualizations—depends on understanding how data is stored and accessed. You're being tested not just on what these structures are, but on when to use each one and how they behave differently. The difference between a vector and a list, or between a matrix and a data frame, determines whether your code runs smoothly or throws cryptic errors.
These structures fall into patterns based on two key questions: Does it hold one data type or many? and How many dimensions does it have? Master these distinctions, and you'll know exactly which structure to reach for in any situation. Don't just memorize definitions—know what problem each structure solves and how to access its elements.
These structures enforce type consistency—every element must be the same type (numeric, character, logical, etc.). This constraint enables faster computations and predictable behavior.
c() to combine elements; supports numeric, character, logical, integer, and complex typesmatrix() with specified nrow and ncol arguments%*%), transposition (t()), and inversion (solve())[row, col] notation—leave one blank to select entire rows or columns (e.g., mat[1, ] gets row 1)array() specifying a dim vector that defines the size of each dimensionarr[1, 2, 3] accesses row 1, column 2, layer 3Compare: Vectors vs. Matrices vs. Arrays—all enforce single-type storage, but differ in dimensionality (1D, 2D, nD). If a question asks about storing image pixel data across color channels, arrays are your answer.
These structures can hold different data types simultaneously. This flexibility makes them essential for real-world datasets where you need numbers, text, and categories together.
[[ ]] or $—single brackets [ ] return a sub-list, double brackets extract the actual elementlapply() and sapply(), which apply functions to each element iterativelydata.frame() or imported from CSV/Excel files; the standard format for statistical analysis in Rdplyr::filter(), select(), and mutate() for powerful data manipulationCompare: Lists vs. Data Frames—both hold mixed types, but data frames enforce equal-length columns in a rectangular structure. Use lists when elements have different lengths or structures; use data frames for traditional datasets with observations and variables.
These structures serve specific analytical needs that general structures can't handle as elegantly.
factor() and essential for statistical modeling—R treats factors differently than character vectors in regression and ANOVAlevels argument to set custom order, critical for correct plot axis ordering and contrast codingCompare: Factors vs. Character Vectors—both store text-like data, but factors carry level information that affects statistical functions and plotting. Convert to factors when you need R to recognize categories; keep as characters for free-form text.
| Concept | Best Examples |
|---|---|
| Single data type, 1D | Vectors |
| Single data type, 2D | Matrices |
| Single data type, 3D+ | Arrays |
| Mixed types, flexible structure | Lists |
| Mixed types, rectangular/tabular | Data Frames |
| Categorical data with levels | Factors |
| Element-wise operations | Vectors, Matrices, Arrays |
| Statistical modeling input | Data Frames, Factors |
You need to store a dataset with columns for name (text), age (numeric), and enrolled (TRUE/FALSE). Which structure should you use, and why wouldn't a matrix work?
Compare and contrast how you would access the third element of a vector versus the third element of a list. What's the difference between mylist[3] and mylist[[3]]?
Which two structures are both two-dimensional but differ in their type flexibility? When would you choose one over the other?
A function returns multiple outputs of different lengths—a numeric vector of coefficients, a character string for the model name, and a data frame of residuals. Which structure should you use to store all of these together?
You're building a regression model and have a variable for education level (High School, Bachelor's, Master's, PhD). Should this be stored as a character vector or a factor? What happens differently in the model based on your choice?