Choosing the right data structure is one of the most practical skills you'll develop in this course. Every data structure has strengths and weaknesses, and the goal is to match those traits to what your problem actually demands. A poor choice can turn an efficient algorithm into a slow one, even if the logic is correct.

This section covers the factors that guide your selection, the performance characteristics of common structures, and the trade-offs you'll need to weigh.

Factors in Data Structure Selection

Before picking a data structure, you need to understand your problem along several dimensions:

Problem requirements come first. What operations does your program need to perform? If you're mostly searching, that points you toward different structures than if you're mostly inserting and deleting. Think about how often each operation runs, too. An operation that happens once at startup matters less than one that runs inside a loop millions of times. Also consider hard constraints like memory limits or response-time requirements.

Data characteristics matter just as much. Ask yourself:

How large is the dataset? Ten items? Ten million?
Is the data static (loaded once, then only read) or dynamic (constantly changing)?
What type of data are you storing? Are there relationships between elements, like parent-child hierarchies?

Efficiency is where you get quantitative. Evaluate the time complexity of the operations you need most, and check the space complexity to make sure your structure fits in available memory.

Ease of implementation shouldn't be ignored. A theoretically optimal structure that's error-prone to implement can cost you more time debugging than a simpler structure that's "good enough." Readable, maintainable code has real value.

Performance of Data Structures

Here's how the most common structures perform and where they shine:

Arrays

$O(1)$ access by index, which makes them ideal when you know where an element is
Great for sequential access and fixed-size collections
Insertion or deletion in the middle requires shifting elements, costing $O(n)$

Linked Lists

$O(1)$ insertion and deletion if you already have a reference to the node (finding that node first costs $O(n)$ )
Well-suited for dynamic data that changes size frequently
Each node stores an extra pointer, so they use more memory per element than arrays
Random access is $O(n)$ because you must traverse from the head

Stacks and Queues

Both provide $O(1)$ insertion and removal, but only at specific ends
A stack follows LIFO (Last-In-First-Out) order. Think function call management and undo operations.
A queue follows FIFO (First-In-First-Out) order. Think task scheduling and breadth-first search.

Trees

Balanced trees (like AVL trees and B-trees) give $O(\log n)$ search, insertion, and deletion
Naturally represent hierarchical data (file systems, organizational charts)
Require extra memory for child pointers
An unbalanced binary search tree can degrade to $O(n)$ in the worst case, which is why balanced variants exist

Hash Tables

$O(1)$ average-case access, insertion, and deletion
Excellent for fast lookups and key-value storage
Worst case can degrade to $O(n)$ if many keys collide into the same bucket
Collisions are handled through techniques like chaining (storing a list at each bucket) or open addressing (probing for the next empty slot)

Factors in data structure selection, Моделювання даних — Вікіпедія

Trade-offs and Guidelines

Time vs. Space Complexity Trade-offs

There's almost always a tension between time and space. Faster access often costs more memory, and saving memory often means slower operations.

Concrete examples make this clearer:

Hash tables achieve $O(1)$ average access by allocating extra space for buckets and handling collisions. You're trading memory for speed.
Linked lists use only as much memory as they need (no wasted slots), but you pay for that with $O(n)$ random access since there's no index.
Arrays give you $O(1)$ index-based access, but if your data is sparse (say, only 100 values spread across indices 0 to 1,000,000), most of that allocated space is wasted.

When making a selection, ask: Is time or space the bigger constraint for this problem? The answer steers you toward the right side of the trade-off.

Guidelines for Structure Suitability

Use this as a quick-reference for matching your problem's needs to a structure:

Frequent access by index or key

Arrays: $O(1)$ access by index
Hash tables: $O(1)$ average-case access by key

Frequent insertion and deletion

Linked lists: $O(1)$ insertion/deletion when you have a reference to the position
Balanced trees: $O(\log n)$ insertion and deletion while maintaining order

Hierarchical or recursive data

Trees naturally represent parent-child relationships (file directories, parse trees)
Recursive data structures fit problems that break into smaller subproblems of the same shape

Priority-based access

Heaps provide $O(1)$ access to the min or max element, with $O(\log n)$ insertion
Priority queues (often implemented with heaps) maintain elements ordered by priority

Range queries or ordered data

Balanced search trees (BSTs, AVL trees, B-trees) support efficient range queries and keep elements sorted

Unique elements and fast membership testing

Sets store unique elements and provide fast membership checks
Hash tables offer $O(1)$ average-case membership testing

Key-value pair storage

Hash tables (and the dictionaries/maps built on them) give efficient storage and retrieval by key

Stack or queue behavior

Stacks (LIFO): function call management, expression evaluation, backtracking algorithms
Queues (FIFO): task scheduling, breadth-first search, buffering

The pattern to notice: there's no single "best" data structure. The best choice depends entirely on which operations dominate your problem. Get clear on your requirements first, then let the performance characteristics guide your decision.

2,589 studying →