Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
In data science, choosing the right data structure isn't just a programming decision—it's a performance decision that can make or break your analysis. You're being tested on understanding time complexity, memory trade-offs, and access patterns because these concepts determine whether your code runs in seconds or hours on real-world datasets. The structures you'll encounter—from simple arrays to complex graphs—each solve specific problems, and knowing when to reach for each one separates competent data scientists from struggling ones.
Don't just memorize what each structure looks like—know why it exists and what problem it solves. Can you explain why a hash table beats an array for lookups? Why you'd choose a heap over a sorted list for finding maximums? These are the conceptual connections that appear in exam questions and, more importantly, in actual data science work. Master the underlying mechanics, and the implementation details will follow naturally.
These structures store elements in a specific order where position matters. The key trade-off here is between access speed and modification flexibility—you're essentially choosing between fast reads or fast writes.
Compare: Arrays vs. Linked Lists—both store ordered sequences, but arrays win on access speed ( vs. ) while linked lists win on modification flexibility ( vs. for insertions). If an FRQ asks about trade-offs in data structure selection, this is your foundational example.
Stacks and queues restrict how you interact with data, enforcing specific access patterns. This constraint isn't a limitation—it's a feature that models real-world processes and simplifies certain algorithms.
push (add to top) and pop (remove from top), both running in timeenqueue (add to back) and dequeue (remove from front), both running in timeCompare: Stacks vs. Queues—both restrict access to specific ends, but LIFO vs. FIFO determines completely different use cases. Stacks handle nested/recursive processes; queues handle sequential/fair-ordering processes. Know which real-world scenario maps to which structure.
When data has natural parent-child relationships or requires efficient sorting/searching, tree-based structures shine. The branching factor and balance of these structures determine their performance characteristics.
Compare: Trees vs. Heaps—both are hierarchical, but trees optimize for searching any element while heaps optimize for accessing extremes. A binary search tree finds any value in ; a heap finds only the max/min in but can't efficiently search for arbitrary values.
When you need to retrieve data by a unique identifier rather than position, key-value structures provide near-instant lookups. The magic lies in hash functions that convert keys into array indices.
dict is your go-to for fast lookupsCompare: Hash Tables vs. Dictionaries—functionally similar (dictionaries are typically implemented as hash tables), but "hash table" emphasizes the mechanism while "dictionary" emphasizes the interface. In Python, you'll use dictionaries; in algorithm discussions, you'll reference hash table properties.
Graphs and matrices represent relationships and multidimensional data. These structures are essential for network analysis, linear algebra, and most machine learning computations.
Compare: Graphs vs. Matrices—graphs emphasize relationships while matrices emphasize numerical computation, but they're deeply connected. An adjacency matrix represents a graph numerically, enabling linear algebra techniques on network problems. Know both representations.
| Concept | Best Examples |
|---|---|
| Access by Position | Arrays, Matrices |
| Access by Key | Hash Tables, Dictionaries |
| Modification | Linked Lists, Stacks, Queues |
| Search | Binary Search Trees, Heaps (for extremes only) |
| Hierarchical Relationships | Trees, Heaps |
| Network/Relational Data | Graphs, Matrices (adjacency) |
| LIFO/FIFO Constraints | Stacks, Queues |
| Numerical Computation | Matrices, Arrays |
Which two structures offer average-case lookups, and what mechanism makes this possible?
Compare arrays and linked lists: if you need to frequently insert elements in the middle of a sequence, which would you choose and why?
A task scheduler needs to process jobs in the order they arrive. Which constrained access structure models this, and what would happen if you used the other one instead?
Explain why a heap can find the maximum element faster than a binary search tree, but a binary search tree can find any element faster than a heap.
You're building a social network analysis tool that needs to find the shortest path between users and also perform matrix operations on connection strengths. Which two structures would you use together, and how do they relate to each other?