Types of searching algorithms
Searching algorithms let you find specific items within a collection of data. Choosing the right one depends on how your data is organized and how large it is. This section covers the core algorithms you need to know, from the simplest to the most efficient.
Linear search
Linear search checks each element one at a time, from start to finish, until it finds a match or reaches the end.
- Works on unsorted data, no preprocessing needed
- Time complexity: , where is the number of elements
- Best case is (target is the first element); worst case is (target is last or absent)
- Practical for small datasets or when you expect the target to be near the beginning
- Becomes inefficient as the dataset grows, since on average you'll check half the elements
Binary search
Binary search works only on sorted data. It repeatedly cuts the search space in half, which makes it dramatically faster than linear search for large datasets.
Here's how it works, step by step:
- Compare the target value to the middle element of the current range
- If they match, you're done
- If the target is smaller, narrow the search to the left half
- If the target is larger, narrow the search to the right half
- Repeat until the target is found or the range is empty
- Time complexity: , since each step halves the remaining elements
- This is a classic example of the divide-and-conquer strategy
- For a sorted array of 1,000,000 elements, binary search needs at most about 20 comparisons, while linear search could need all 1,000,000
Hash-based search
Hash-based search uses a hash function to convert a key directly into an array index, allowing near-instant lookups.
- Average time complexity: for search, insert, and delete
- The trade-off is space: hash tables typically use more memory than arrays or lists
- Performance depends heavily on the quality of the hash function. A poor one causes many collisions (multiple keys mapping to the same index)
- Two main collision resolution strategies:
- Chaining: each index stores a linked list of entries that hashed to that location
- Open addressing (probing): when a collision occurs, the algorithm searches for the next available slot
Complexity analysis
Complexity analysis gives you a way to compare algorithms without running them on actual hardware. It describes how an algorithm's resource usage (time or memory) grows as the input gets larger.
Time complexity
Time complexity measures how the number of operations scales with input size . It's expressed using Big O notation, which captures the upper bound on growth rate.
Common time complexities, from fastest to slowest:
| Notation | Name | Example |
|---|---|---|
| Constant | Hash table lookup | |
| Logarithmic | Binary search | |
| Linear | Linear search | |
| Linearithmic | Merge sort | |
| Quadratic | Nested loop comparison |
When analyzing complexity, focus on the dominant term. If an algorithm takes steps, the Big O is because the term dominates as grows large.
Space complexity
Space complexity measures how much memory an algorithm uses relative to input size, also expressed in Big O notation.
- Includes both the input space and any auxiliary space (extra memory the algorithm allocates)
- A common trade-off exists: you can often make an algorithm faster by using more memory, or save memory at the cost of speed
- For example, hash tables achieve search time but require extra memory for the table itself
Best vs. worst case
Any algorithm can perform differently depending on the specific input it receives.
- Best case: the most favorable input (e.g., linear search where the target is the first element)
- Worst case: the most unfavorable input (e.g., linear search where the target isn't in the list at all)
- Average case: expected performance across typical inputs
Worst-case analysis is usually emphasized because it gives you a guaranteed upper bound. If your algorithm runs in worst case, you know it will never take longer than that, regardless of input.
Algorithm implementation
How you implement an algorithm matters just as much as which algorithm you choose. The same algorithm can behave very differently depending on whether you use loops or recursion, and which data structure backs it.
Iterative vs. recursive approaches
- Iterative approaches use loops (for, while) to repeat operations. They're generally more memory-efficient because they don't add frames to the call stack.
- Recursive approaches solve a problem by calling themselves on smaller subproblems. This can produce cleaner, more readable code for naturally recursive problems like tree traversal or quicksort.
- Recursion carries overhead: each recursive call adds to the call stack, which can cause stack overflow on deep recursions.
- Some recursive algorithms can be rewritten iteratively, or optimized with tail recursion (where the recursive call is the last operation in the function).
Data structure considerations
The data structure you store your elements in shapes which search algorithms are even possible.
| Data Structure | Search Time | Insert/Delete | Notes |
|---|---|---|---|
| Sorted array | (binary search) | Great for static data you search often | |
| Unsorted array | append | Simple, but search is slow | |
| Linked list | at head | Can't do binary search efficiently | |
| Binary search tree | average | average | Can degrade to if unbalanced |
| Hash table | average | average | Needs good hash function |
Choosing the right structure depends on whether you search more often than you insert, how large the dataset is, and whether the data changes frequently.
_sequential_element_examination_simple_implementation_unsorted_data%22-ZwaxV.png)
Performance optimization
Beyond choosing the right algorithm, you can squeeze out more performance through indexing and caching. These techniques exploit patterns in how data is accessed.
Indexing techniques
An index is an auxiliary data structure that speeds up search at the cost of extra storage and maintenance.
- B-trees and B+ trees are widely used in databases to keep large datasets searchable on disk
- Inverted indexes map each word to the documents containing it, which is how full-text search engines work
- Spatial indexes (R-trees, quadtrees) organize geometric or geographic data so that queries like "find all points within this region" run efficiently
- The trade-off: indexes speed up reads but slow down writes, since the index must be updated whenever data changes
Caching strategies
Caching stores frequently accessed results so you don't recompute or re-fetch them.
- LRU (Least Recently Used): evicts the item that hasn't been accessed for the longest time
- LFU (Least Frequently Used): evicts the item accessed the fewest times overall
- Modern systems use multi-level caching (CPU cache → main memory → disk), where each level is larger but slower
- The key balance: a bigger cache means more hits, but also more memory consumed. Diminishing returns set in once the hit rate is already high.
Search algorithm applications
Search algorithms show up everywhere, not just in textbook exercises. Two major application areas are databases and information retrieval.
Database queries
- Databases use B-tree indexes and hash indexes to avoid scanning entire tables for every query
- A query optimizer analyzes your query and picks the most efficient execution plan (which indexes to use, what order to join tables)
- When combining data from multiple tables, the database uses join algorithms like nested loop join, hash join, or merge join, each suited to different data sizes and sort orders
Information retrieval systems
Search engines and document retrieval systems combine several techniques:
- Inverted indexes map every word to the documents that contain it, enabling fast full-text search
- TF-IDF (Term Frequency–Inverse Document Frequency) ranks results by relevance. A word that appears often in one document but rarely across all documents gets a high score.
- Stemming and lemmatization reduce words to their root forms (e.g., "running" → "run") so searches match different word variations
- PageRank-style algorithms evaluate document importance based on link structure, not just keyword matches
Probabilistic search methods
Probabilistic algorithms use randomness as a deliberate tool. They can sometimes solve problems faster than any deterministic approach, though they may sacrifice guaranteed correctness or guaranteed running time.
Monte Carlo algorithms
Monte Carlo methods use random sampling to produce approximate answers. The more samples you take, the more accurate the result becomes.
- Used in numerical integration, optimization, and physical simulations
- Monte Carlo tree search powers game AI for chess and Go by randomly simulating many possible game outcomes to evaluate moves
- Randomized quicksort picks a random pivot, which gives an expected time complexity of and avoids the worst-case behavior of always picking a bad pivot
Las Vegas algorithms
Unlike Monte Carlo methods, Las Vegas algorithms always produce the correct answer. The randomness only affects how long they take.
- They're guaranteed to terminate (with probability 1), though the worst-case time can be unpredictable
- Randomized quicksort is actually both a Monte Carlo and Las Vegas example: it always sorts correctly, but its running time varies randomly
- The Rabin-Karp string matching algorithm uses hashing with randomization to search for patterns efficiently
- Las Vegas algorithms often have simpler implementations than their fully deterministic equivalents
Parallel search techniques
When datasets get very large, a single processor isn't enough. Parallel search distributes the work across multiple processors, cores, or machines.
_sequential_element_examination_simple_implementation_unsorted_data%22-Screen-Shot-2016-10-09-at-11.33.10-PM.jpg)
Distributed searching
- The search space is divided among multiple nodes in a network
- Load balancing ensures no single node gets overwhelmed while others sit idle
- The MapReduce paradigm splits a large search into many small tasks (Map), then combines results (Reduce)
- Distributed hash tables spread key-value pairs across nodes so any node can look up any key efficiently
- Network latency and communication overhead become real bottlenecks, so minimizing data transfer between nodes is critical
GPU-accelerated search
GPUs contain thousands of small cores designed for parallel work, making them useful for search problems that can be broken into many independent tasks.
- Parallel prefix sum (scan) operations are a fundamental building block for many GPU search algorithms
- GPU-optimized sorting algorithms like radix sort can serve as a preprocessing step for search
- The main cost is memory transfer: moving data between CPU and GPU memory takes time, so GPU acceleration only pays off when the computation is large enough to justify the transfer overhead
Advanced search algorithms
These algorithms improve on basic approaches by exploiting specific properties of the data.
Interpolation search
Interpolation search improves on binary search when data is uniformly distributed. Instead of always checking the middle, it estimates where the target is likely to be based on its value.
Think of it like looking up a name in a phone book: if the name starts with "W," you'd open near the back, not the middle.
- Average time complexity: for uniformly distributed data
- Worst case: , which happens when the distribution is highly non-uniform
- Each step requires more computation than binary search (calculating the estimated position), so it's only worth using when the data distribution is known to be roughly uniform
Exponential search
Exponential search is designed for sorted, unbounded (or very large) lists where you don't know the size in advance.
- Start at position 1 and keep doubling: check positions 1, 2, 4, 8, 16, ...
- Once you find a position where the value exceeds the target, you've bounded the range
- Run binary search within that bounded range
- Time complexity: , where is the position of the target element
- Particularly useful when the target is near the beginning of a large or infinite sequence
Search in specialized structures
Different data structures call for different search strategies. Trees and graphs each have their own traversal and search algorithms.
Tree-based searches
- Depth-first search (DFS) explores as deep as possible along each branch before backtracking. Uses a stack (or recursion).
- Breadth-first search (BFS) explores all nodes at the current depth before moving deeper. Uses a queue.
- Binary search trees let you search in by comparing the target to each node and going left or right. Balanced variants like AVL trees and Red-Black trees guarantee this stays even after insertions and deletions.
- Tries store strings character by character, making prefix-based searches (like autocomplete) very fast.
- Alpha-beta pruning reduces the search space in game trees by skipping branches that can't possibly affect the final decision.
Graph search algorithms
- Dijkstra's algorithm finds the shortest path from a source node to all other nodes in a graph with non-negative edge weights
- A* search extends Dijkstra's by adding a heuristic estimate of remaining distance, making it faster for pathfinding in games and robotics
- Floyd-Warshall computes shortest paths between all pairs of nodes, useful for dense graphs where you need the full distance table
- Topological sort orders nodes in a directed acyclic graph (DAG) so that every edge points forward, useful for dependency resolution
- Strongly connected components algorithms (like Tarjan's) identify clusters of nodes where every node can reach every other node
Theoretical foundations
These concepts connect searching to the broader theory of computation and help you understand what's fundamentally possible and impossible.
Search problem complexity classes
Complexity classes group problems by how hard they are to solve.
- P: problems solvable in polynomial time (e.g., sorting, searching a sorted array). These are considered "efficiently solvable."
- NP: problems where a proposed solution can be verified in polynomial time, even if finding the solution might be hard. Every problem in P is also in NP.
- NP-complete: the hardest problems in NP. If you could solve any one of them in polynomial time, you could solve all NP problems in polynomial time. The Boolean satisfiability problem (SAT) was the first proven NP-complete problem.
When you encounter an NP-complete problem in practice, that's a signal to look for approximate or heuristic solutions rather than exact ones.
Lower bounds for searching
Lower bounds tell you the best any algorithm can possibly do for a given problem, no matter how clever it is.
- For comparison-based searching in a sorted array, the lower bound is . Binary search achieves this, so it's optimal among comparison-based methods.
- For searching an unsorted array with no additional information, the lower bound is . You might have to check every element.
- These bounds are proven using decision tree models (every comparison is a branch, and the tree height gives the minimum comparisons) and adversary arguments (an adversary forces the algorithm to do maximum work).
Understanding lower bounds tells you when to stop trying to find a faster algorithm and when there's still room for improvement.