Hash Table Implementation
Hash tables map keys to array indices using hash functions, giving you average-case lookups, insertions, and deletions. That speed advantage makes them one of the most widely used data structures in practice. The tradeoff: you need to handle collisions (when two keys land on the same index) and manage the table's size as it fills up.
Hash table implementation with arrays
A hash table is backed by an array. A hash function takes a key and returns an index into that array. The quality of the hash function directly determines how evenly keys spread across the table.
Common hash functions:
- Modulo hash function: , where is the table size. Simple and fast. Choosing a prime number for helps distribute keys more evenly.
- Multiplication hash function: , where is a constant between 0 and 1. Knuth suggests (the golden ratio conjugate). This method is less sensitive to the choice of .
- Universal hashing: Randomly select a hash function from a designed family of functions at runtime. This gives provably low expected collisions regardless of the input distribution.
Collision resolution is needed because two different keys can hash to the same index. There are two main strategies:
Separate chaining stores a linked list (or other collection) at each index. When a collision occurs, the new key is appended to the list at that index. Lookups walk the list to find the target key. The lists grow dynamically, so the table never technically "fills up," but long chains hurt performance.
Open addressing stores all entries directly in the array. When a collision occurs, you probe for the next available slot:
- Linear probing: Check index , then , and so on (wrapping around). Simple, but tends to create clusters of occupied slots, which slows future insertions and lookups.
- Quadratic probing: Check index , then , then , etc. Reduces clustering compared to linear probing, but can fail to find an empty slot if the table is more than half full (depending on table size).
- Double hashing: Use a second hash function to determine the step size. Probe sequence: for This produces more uniform probe sequences and avoids clustering.
Time complexity of hash operations
Average case: for insertion, deletion, and search. This assumes a good hash function that distributes keys uniformly and a load factor that stays reasonable (typically ).
Worst case: for all three operations. This happens when every key hashes to the same index. With chaining, you'd be searching a single linked list of length . With open addressing, you'd probe through nearly every slot.
The load factor controls how full the table is:
where is the number of stored keys and is the table size. As increases, collisions become more frequent. For separate chaining, can exceed 1 (more keys than slots), but performance degrades. For open addressing, must stay below 1, and performance drops sharply as it approaches 1. A common threshold is for triggering a resize.

Hash Table Optimization and Applications
Rehashing and dynamic resizing
When the load factor crosses a threshold, the table needs to grow. This process is called rehashing:
- Allocate a new array, typically double the current size.
- Recompute the hash for every existing key using the new table size (since changed, all indices change).
- Insert each key into the new array.
- Deallocate the old array.
Rehashing is since you touch every key. But it happens infrequently enough that the amortized cost of insertion remains . Think of it this way: you do cheap insertions, then one expensive rehash, which averages out to per insertion.
Some implementations also shrink the table (e.g., halve it when drops below 0.25) to avoid wasting memory after many deletions.
![Hash table implementation with arrays, Hash tables explained [step-by-step example] ยท YourBasic](https://storage.googleapis.com/static.prod.fiveable.me/search-images%2F%22Hash_table_implementation_with_arrays_hash_functions_collision_resolution_separate_chaining_open_addressing%22-hash-table.png)
Hash tables vs other data structures
|Hash Table|Array|Balanced BST (AVL, Red-Black)| |---|---|---|---| | Search | avg, worst | (unsorted), (sorted) | worst | | Insert | avg, worst | (shifting) | worst | | Delete | avg, worst | (shifting) | worst | |Ordered iteration|Not supported|By index only|Yes (in-order traversal)| | Range queries | Not efficient | Not efficient | Efficient |
The key takeaway: hash tables win on raw speed for individual key lookups. Balanced BSTs win when you need sorted order or range queries (e.g., "find all keys between 50 and 100"). Arrays are best when you're accessing elements by numeric index.
Applications of hash tables
Caching: Store recently or frequently accessed data in a hash table keyed by the request (URL, query, etc.). Instead of hitting a database or making a network call, you check the cache first. If the key exists, you return the cached result in .
Indexing: Build a lookup structure that maps search terms to their locations. For example, an inverted index maps each word in a document collection to the list of documents containing that word. Search engines rely heavily on this pattern.
Counting frequencies: Use a hash table where keys are elements and values are counts. To count word frequencies in a text:
- For each word, check if it exists in the table.
- If yes, increment its count.
- If no, insert it with a count of 1.
This runs in for words, since each lookup and update is on average. The same pattern applies to detecting duplicates, computing histograms, and grouping data by category.