🔁Data Structures

Common Data Structure Operations

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Every data structures exam tests your ability to choose the right tool for the job. That means understanding not just what operations exist, but why certain structures handle them efficiently while others don't. You're being tested on time complexity trade-offs, memory access patterns, and the fundamental question: when does constant time matter, and when is logarithmic good enough?

The operations covered here (insertion, deletion, traversal, and search) form the backbone of algorithm analysis. Whether you're writing an FRQ about implementing a priority queue or analyzing why a hash table outperforms a linked list for lookups, you need to connect each operation to its underlying mechanism: contiguous vs. linked memory, ordering constraints, and hashing principles. Don't just memorize that arrays have $O(n)$ insertion. Know why shifting elements creates that cost.

Sequential Access Structures

These structures store elements in a defined order and require traversal to access arbitrary elements. The key trade-off is between memory locality (arrays) and flexibility (linked lists).

Array Insertion and Deletion

Arrays store elements in contiguous memory, meaning each element sits right next to the previous one. This layout is what gives arrays $O(1)$ random access: if you know the index, you can jump straight to the memory address.

The downside shows up during modification:

Shifting elements causes $O(n)$ time complexity. Every insertion or deletion (except at the end) requires moving all subsequent elements one position to maintain that contiguous layout. Inserting at index 0 in a 1000-element array means moving all 1000 elements.
Fixed-size allocation means arrays must be resized when capacity is exceeded, typically by allocating a new, larger array and copying everything over. Dynamic arrays (like Java's ArrayList) handle this automatically, usually doubling in size. This gives amortized $O(1)$ for appending to the end, even though individual resizes cost $O(n)$ .
Cache-friendly memory layout makes arrays excellent for iteration despite costly modifications. Because elements are stored contiguously, the CPU can load chunks of the array into cache at once. This is why arrays are preferred when reads dominate writes.

Linked List Traversal and Manipulation

Linked lists store each element in a separate node that contains the data plus a pointer to the next node (and the previous node, in a doubly linked list). Nodes can live anywhere in memory.

$O(1)$ insertion and deletion when you already have a reference to the target node. You just rewire the pointers. But if you need to find the node first, that search costs $O(n)$ .
Sequential traversal required for access. There's no random indexing, so finding the $k$ th element means following $k$ pointers from the head. That's $O(n)$ in the worst case.
Dynamic memory allocation eliminates resizing overhead entirely. Each node is allocated independently, making linked lists ideal for frequent insertions and deletions at known positions.

Compare: Arrays vs. Linked Lists: both store ordered sequences, but arrays offer $O(1)$ random access while linked lists offer $O(1)$ insertion/deletion at known nodes. If an FRQ asks about implementing a playlist with frequent additions and removals, linked list is your answer; for binary search, you need an array.

LIFO and FIFO Abstractions

Stacks and queues restrict access to specific ends of the structure. This constraint is a feature, not a limitation. It guarantees constant-time operations and models real-world processes cleanly.

Stack Push and Pop Operations

A stack follows Last In, First Out (LIFO) order. Only the top element is accessible.

Push adds to the top, pop removes from the top. Both are $O(1)$ .
Function call management relies on the call stack. Every time a function is called, a new frame (holding return address, local variables, parameters) is pushed onto the stack. When the function returns, its frame is popped. This is how recursion works under the hood.
Expression evaluation uses stacks for parsing parentheses and converting between infix and postfix notation. For example, to evaluate 3 + 4 * 2 correctly, a stack-based algorithm can handle operator precedence without parentheses.

Queue Enqueue and Dequeue Operations

A queue follows First In, First Out (FIFO) order. Elements are processed in the order they arrive.

Enqueue adds to the back, dequeue removes from the front. Both are $O(1)$ .
Scheduling applications include CPU task scheduling, print queues, and breadth-first search traversal.
Circular array implementation avoids the shifting problem that a naive array-based queue would have. Instead of shifting all elements forward on each dequeue, you treat the array as a ring buffer with front and rear pointers that wrap around. When either pointer reaches the end of the array, it wraps back to index 0.

Compare: Stacks vs. Queues: both offer $O(1)$ insertion and removal, but stacks reverse order (LIFO) while queues preserve it (FIFO). When asked about undo functionality, think stack; for processing requests fairly, think queue.

Hierarchical Structures

Trees and heaps organize data in parent-child relationships. The branching factor and balance constraints determine whether operations achieve logarithmic efficiency.

Binary Tree Insertion, Deletion, and Traversal

A binary search tree (BST) maintains the property that for every node, all values in its left subtree are smaller and all values in its right subtree are larger. This property is what enables efficient searching.

Balanced trees achieve $O(\log n)$ operations because each comparison eliminates roughly half the remaining nodes. But an unbalanced tree can degrade to $O(n)$ , essentially becoming a linked list. Imagine inserting 1, 2, 3, 4, 5 in order: you get a straight chain to the right. Self-balancing trees (AVL, red-black) prevent this by restructuring after insertions and deletions.
Traversal orders serve different purposes: in-order (left, root, right) gives sorted output for BSTs; pre-order (root, left, right) captures tree structure for copying or serialization; post-order (left, right, root) processes children before parents, enabling safe deletion of subtrees.
BSTs are the foundation for advanced structures including AVL trees, red-black trees, and B-trees used in databases.

Heap Insertion and Extraction

A heap is a complete binary tree (every level is fully filled except possibly the last, which is filled left to right) that satisfies the heap property: in a min-heap, every parent is smaller than its children; in a max-heap, every parent is larger.

The min (or max) element is always at the root, making it perfect for priority queues where you repeatedly need the extreme value.
Insertion (bubble-up): Add the new element at the next available position (bottom-left), then repeatedly swap it with its parent until the heap property is restored. This takes at most $O(\log n)$ swaps, since the tree height is $\log n$ .
Extraction (heapify-down): Remove the root, move the last element to the root position, then repeatedly swap it with its smaller (or larger) child until the heap property holds. Also $O(\log n)$ .

Because a heap is a complete binary tree, it can be stored efficiently in an array: for a node at index $i$ , its left child is at $2i + 1$ , its right child at $2i + 2$ , and its parent at $\lfloor(i - 1)/2\rfloor$ .

Compare: Binary Search Trees vs. Heaps: BSTs maintain full sorted order (in-order traversal gives all elements sorted) while heaps only guarantee the root is the min or max. Use a BST when you need range queries or ordered iteration; use a heap when you only need repeated access to the minimum or maximum.

Hash-Based Structures

Hash tables trade ordering for speed by using a hash function to compute storage locations directly. The $O(1)$ average case depends entirely on minimizing collisions.

Hash Table Insertion, Deletion, and Lookup

A hash function takes a key and returns an index into the underlying array. If the function distributes keys uniformly, most slots hold at most one element, and you get direct access.

Average-case $O(1)$ for all operations. No traversal or comparison chain needed. You compute the hash, go to that index, and you're done.
Collisions happen when two different keys hash to the same index. Two main resolution strategies:
- Chaining: Each array slot holds a linked list (or other collection). Colliding keys are appended to the list at that index.
- Open addressing: When a collision occurs, probe for the next empty slot using a defined sequence (linear probing, quadratic probing, or double hashing).
Load factor (number of elements divided by table size) directly affects performance. As the load factor increases, collisions become more frequent and operations degrade toward $O(n)$ . Most implementations resize the table (rehash) when the load factor exceeds a threshold, commonly around 0.75.

Compare: Hash Tables vs. Binary Search Trees: hash tables offer $O(1)$ average lookup but provide no ordering; BSTs guarantee $O(\log n)$ operations while maintaining sorted order. If you need to find all keys in a range, BST wins. For simple key-value lookups, hash tables dominate.

Graph Traversal Algorithms

Graphs model relationships without hierarchical constraints. BFS and DFS are the two fundamental traversal strategies, each suited to different problem types.

Breadth-First Search (BFS)

BFS explores a graph level by level, visiting all neighbors of the current node before moving deeper. It uses a queue to track which node to visit next.

Start at the source node. Mark it as visited and enqueue it.
Dequeue a node, then enqueue all of its unvisited neighbors (marking each as visited).
Repeat until the queue is empty.

Time complexity: $O(V + E)$ where $V$ is vertices and $E$ is edges. Every node and edge is examined once.
Guarantees shortest path in unweighted graphs, since it discovers nodes in order of their distance from the source.
Applications: shortest path (unweighted), connected components, bipartiteness testing.

Depth-First Search (DFS)

DFS explores as deep as possible along each branch before backtracking. It uses a stack (or recursion's implicit call stack).

Start at the source node. Mark it as visited and push it onto the stack.
Peek at the top of the stack. If it has an unvisited neighbor, mark that neighbor as visited and push it.
If no unvisited neighbors remain, pop the node off the stack.
Repeat until the stack is empty.

Same $O(V + E)$ complexity as BFS, but the different traversal order yields different applications.
Applications: cycle detection, topological sorting (for DAGs), finding strongly connected components.

Compare: BFS vs. DFS: both traverse all reachable nodes in $O(V + E)$ time, but BFS finds shortest paths in unweighted graphs while DFS excels at detecting cycles and topological ordering. FRQs about "minimum number of steps" signal BFS; "is there a cycle?" signals DFS.

Sorting and Searching Algorithms

These algorithms transform and query data. Understanding their complexity guarantees helps you choose the right approach for different input characteristics.

Quicksort and Mergesort

Both are divide-and-conquer algorithms that split the input, sort the pieces, and combine results. They differ in where the real work happens.

Quicksort picks a pivot, partitions elements into "less than pivot" and "greater than pivot," then recursively sorts each partition. The work happens during partitioning. It averages $O(n \log n)$ but degrades to $O(n^2)$ on already-sorted input if the pivot is chosen poorly (e.g., always picking the first element). Randomized pivot selection avoids this in practice.
Mergesort splits the array in half, recursively sorts each half, then merges the sorted halves. The work happens during merging. It guarantees $O(n \log n)$ in all cases and is stable (preserves relative order of equal elements).
Space trade-off: Quicksort is in-place, using only $O(\log n)$ stack space for recursion. Mergesort requires $O(n)$ auxiliary space for the merge step. This is why quicksort is often preferred for arrays in practice, while mergesort is natural for linked lists (where the merge step doesn't need extra space).

Binary Search

Binary search finds a target value in a sorted collection by repeatedly halving the search space.

Compare the target to the middle element.
If equal, you're done. If the target is smaller, search the left half. If larger, search the right half.
Repeat until found or the search space is empty.

$O(\log n)$ time complexity. Each comparison eliminates half the remaining elements. Searching 1 million elements takes at most about 20 comparisons.
Requires sorted input. This is the most common mistake: applying binary search to an unsorted collection gives incorrect results.
Foundational technique appearing in BST lookups, database indexing, and countless optimization problems.

Compare: Quicksort vs. Mergesort: both achieve $O(n \log n)$ average case, but quicksort risks $O(n^2)$ worst case while mergesort guarantees $O(n \log n)$ . When stability matters or you're sorting linked lists, choose mergesort; for in-place sorting of arrays, quicksort is typically faster in practice due to better cache behavior.

Quick Reference Table

Concept	Best Examples
$O(1)$ access/operations	Array indexing, stack push/pop, queue enqueue/dequeue, hash table lookup (average)
$O(\log n)$ operations	Binary search, balanced BST operations, heap insert/extract
$O(n)$ traversal required	Linked list access, array insertion/deletion, linear search
$O(n \log n)$ sorting	Mergesort (guaranteed), quicksort (average), heapsort
Ordering preserved	Arrays, linked lists, BSTs (in-order), queues
No ordering guarantee	Hash tables, heaps (beyond root), sets
Best for priority queues	Heaps (min-heap or max-heap)
Best for key-value lookup	Hash tables (average $O(1)$ ), BSTs (ordered operations)

Self-Check Questions

Which two structures offer $O(1)$ insertion and deletion but differ in the order elements are removed? What real-world scenarios favor each?
A hash table and a balanced BST both support insertion, deletion, and lookup. Compare their time complexities and explain when you would choose a BST despite its slower average case.
You need to process customer service requests in the order they arrive. Which data structure should you use, and what are its core operations?
Compare BFS and DFS: both have $O(V + E)$ complexity, so what determines which algorithm to use for a given graph problem?
An FRQ asks you to implement a system that always returns the smallest element efficiently. Compare using a sorted array, a BST, and a min-heap: which offers the best trade-off for repeated insertions and extractions?

🔁Data Structures

Common Data Structure Operations

Why This Matters

Sequential Access Structures

Array Insertion and Deletion

Linked List Traversal and Manipulation

LIFO and FIFO Abstractions

Stack Push and Pop Operations

Queue Enqueue and Dequeue Operations

Hierarchical Structures

Binary Tree Insertion, Deletion, and Traversal

Heap Insertion and Extraction

Hash-Based Structures

Hash Table Insertion, Deletion, and Lookup

Graph Traversal Algorithms

Breadth-First Search (BFS)

Depth-First Search (DFS)

Sorting and Searching Algorithms

Quicksort and Mergesort

Binary Search

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes