Searching algorithms let you find specific items within a collection of data. Choosing the right one depends on how your data is organized and how large it is. This section covers the core algorithms you need to know, from the simplest to the most efficient.

Linear search

Linear search checks each element one at a time, from start to finish, until it finds a match or reaches the end.

Works on unsorted data, no preprocessing needed
Time complexity: $O(n)$ , where $n$ is the number of elements
Best case is $O(1)$ (target is the first element); worst case is $O(n)$ (target is last or absent)
Practical for small datasets or when you expect the target to be near the beginning
Becomes inefficient as the dataset grows, since on average you'll check half the elements

Binary search

Binary search works only on sorted data. It repeatedly cuts the search space in half, which makes it dramatically faster than linear search for large datasets.

Here's how it works, step by step:

Compare the target value to the middle element of the current range
If they match, you're done
If the target is smaller, narrow the search to the left half
If the target is larger, narrow the search to the right half
Repeat until the target is found or the range is empty

Time complexity: $O(\log n)$ , since each step halves the remaining elements
This is a classic example of the divide-and-conquer strategy
For a sorted array of 1,000,000 elements, binary search needs at most about 20 comparisons, while linear search could need all 1,000,000

Hash-based search

Hash-based search uses a hash function to convert a key directly into an array index, allowing near-instant lookups.

Average time complexity: $O(1)$ for search, insert, and delete
The trade-off is space: hash tables typically use more memory than arrays or lists
Performance depends heavily on the quality of the hash function. A poor one causes many collisions (multiple keys mapping to the same index)
Two main collision resolution strategies:
- Chaining: each index stores a linked list of entries that hashed to that location
- Open addressing (probing): when a collision occurs, the algorithm searches for the next available slot

Complexity analysis

Complexity analysis gives you a way to compare algorithms without running them on actual hardware. It describes how an algorithm's resource usage (time or memory) grows as the input gets larger.

Time complexity

Time complexity measures how the number of operations scales with input size $n$ . It's expressed using Big O notation, which captures the upper bound on growth rate.

Common time complexities, from fastest to slowest:

Notation	Name	Example
$O(1)$	Constant	Hash table lookup
$O(\log n)$	Logarithmic	Binary search
$O(n)$	Linear	Linear search
$O(n \log n)$	Linearithmic	Merge sort
$O(n^2)$	Quadratic	Nested loop comparison

When analyzing complexity, focus on the dominant term. If an algorithm takes $3n^2 + 5n + 10$ steps, the Big O is $O(n^2)$ because the $n^2$ term dominates as $n$ grows large.

Space complexity

Space complexity measures how much memory an algorithm uses relative to input size, also expressed in Big O notation.

Includes both the input space and any auxiliary space (extra memory the algorithm allocates)
A common trade-off exists: you can often make an algorithm faster by using more memory, or save memory at the cost of speed
For example, hash tables achieve $O(1)$ search time but require extra memory for the table itself

Best vs. worst case

Any algorithm can perform differently depending on the specific input it receives.

Best case: the most favorable input (e.g., linear search where the target is the first element)
Worst case: the most unfavorable input (e.g., linear search where the target isn't in the list at all)
Average case: expected performance across typical inputs

Worst-case analysis is usually emphasized because it gives you a guaranteed upper bound. If your algorithm runs in $O(n)$ worst case, you know it will never take longer than that, regardless of input.

Algorithm implementation

How you implement an algorithm matters just as much as which algorithm you choose. The same algorithm can behave very differently depending on whether you use loops or recursion, and which data structure backs it.

Iterative vs. recursive approaches

Iterative approaches use loops (for, while) to repeat operations. They're generally more memory-efficient because they don't add frames to the call stack.
Recursive approaches solve a problem by calling themselves on smaller subproblems. This can produce cleaner, more readable code for naturally recursive problems like tree traversal or quicksort.
Recursion carries overhead: each recursive call adds to the call stack, which can cause stack overflow on deep recursions.
Some recursive algorithms can be rewritten iteratively, or optimized with tail recursion (where the recursive call is the last operation in the function).

Data structure considerations

The data structure you store your elements in shapes which search algorithms are even possible.

Data Structure	Search Time	Insert/Delete	Notes
Sorted array	$O(\log n)$ (binary search)	$O(n)$	Great for static data you search often
Unsorted array	$O(n)$	$O(1)$ append	Simple, but search is slow
Linked list	$O(n)$	$O(1)$ at head	Can't do binary search efficiently
Binary search tree	$O(\log n)$ average	$O(\log n)$ average	Can degrade to $O(n)$ if unbalanced
Hash table	$O(1)$ average	$O(1)$ average	Needs good hash function

Choosing the right structure depends on whether you search more often than you insert, how large the dataset is, and whether the data changes frequently.

Linear search, time complexity - Determining the number of steps in an algorithm - Stack Overflow

Performance optimization

Beyond choosing the right algorithm, you can squeeze out more performance through indexing and caching. These techniques exploit patterns in how data is accessed.

Indexing techniques

An index is an auxiliary data structure that speeds up search at the cost of extra storage and maintenance.

B-trees and B+ trees are widely used in databases to keep large datasets searchable on disk
Inverted indexes map each word to the documents containing it, which is how full-text search engines work
Spatial indexes (R-trees, quadtrees) organize geometric or geographic data so that queries like "find all points within this region" run efficiently
The trade-off: indexes speed up reads but slow down writes, since the index must be updated whenever data changes

Caching strategies

Caching stores frequently accessed results so you don't recompute or re-fetch them.

LRU (Least Recently Used): evicts the item that hasn't been accessed for the longest time
LFU (Least Frequently Used): evicts the item accessed the fewest times overall
Modern systems use multi-level caching (CPU cache → main memory → disk), where each level is larger but slower
The key balance: a bigger cache means more hits, but also more memory consumed. Diminishing returns set in once the hit rate is already high.

Search algorithm applications

Search algorithms show up everywhere, not just in textbook exercises. Two major application areas are databases and information retrieval.

Database queries

Databases use B-tree indexes and hash indexes to avoid scanning entire tables for every query
A query optimizer analyzes your query and picks the most efficient execution plan (which indexes to use, what order to join tables)
When combining data from multiple tables, the database uses join algorithms like nested loop join, hash join, or merge join, each suited to different data sizes and sort orders

Information retrieval systems

Search engines and document retrieval systems combine several techniques:

Inverted indexes map every word to the documents that contain it, enabling fast full-text search
TF-IDF (Term Frequency–Inverse Document Frequency) ranks results by relevance. A word that appears often in one document but rarely across all documents gets a high score.
Stemming and lemmatization reduce words to their root forms (e.g., "running" → "run") so searches match different word variations
PageRank-style algorithms evaluate document importance based on link structure, not just keyword matches

Probabilistic search methods

Probabilistic algorithms use randomness as a deliberate tool. They can sometimes solve problems faster than any deterministic approach, though they may sacrifice guaranteed correctness or guaranteed running time.

Monte Carlo algorithms

Monte Carlo methods use random sampling to produce approximate answers. The more samples you take, the more accurate the result becomes.

Used in numerical integration, optimization, and physical simulations
Monte Carlo tree search powers game AI for chess and Go by randomly simulating many possible game outcomes to evaluate moves
Randomized quicksort picks a random pivot, which gives an expected time complexity of $O(n \log n)$ and avoids the worst-case behavior of always picking a bad pivot

Las Vegas algorithms

Unlike Monte Carlo methods, Las Vegas algorithms always produce the correct answer. The randomness only affects how long they take.

They're guaranteed to terminate (with probability 1), though the worst-case time can be unpredictable
Randomized quicksort is actually both a Monte Carlo and Las Vegas example: it always sorts correctly, but its running time varies randomly
The Rabin-Karp string matching algorithm uses hashing with randomization to search for patterns efficiently
Las Vegas algorithms often have simpler implementations than their fully deterministic equivalents

Parallel search techniques

When datasets get very large, a single processor isn't enough. Parallel search distributes the work across multiple processors, cores, or machines.

Linear search, Analysis of algorithms - Basics Behind

Distributed searching

The search space is divided among multiple nodes in a network
Load balancing ensures no single node gets overwhelmed while others sit idle
The MapReduce paradigm splits a large search into many small tasks (Map), then combines results (Reduce)
Distributed hash tables spread key-value pairs across nodes so any node can look up any key efficiently
Network latency and communication overhead become real bottlenecks, so minimizing data transfer between nodes is critical

GPU-accelerated search

GPUs contain thousands of small cores designed for parallel work, making them useful for search problems that can be broken into many independent tasks.

Parallel prefix sum (scan) operations are a fundamental building block for many GPU search algorithms
GPU-optimized sorting algorithms like radix sort can serve as a preprocessing step for search
The main cost is memory transfer: moving data between CPU and GPU memory takes time, so GPU acceleration only pays off when the computation is large enough to justify the transfer overhead

Advanced search algorithms

These algorithms improve on basic approaches by exploiting specific properties of the data.

Interpolation search

Interpolation search improves on binary search when data is uniformly distributed. Instead of always checking the middle, it estimates where the target is likely to be based on its value.

Think of it like looking up a name in a phone book: if the name starts with "W," you'd open near the back, not the middle.

Average time complexity: $O(\log \log n)$ for uniformly distributed data
Worst case: $O(n)$ , which happens when the distribution is highly non-uniform
Each step requires more computation than binary search (calculating the estimated position), so it's only worth using when the data distribution is known to be roughly uniform

Exponential search

Exponential search is designed for sorted, unbounded (or very large) lists where you don't know the size in advance.

Start at position 1 and keep doubling: check positions 1, 2, 4, 8, 16, ...
Once you find a position where the value exceeds the target, you've bounded the range
Run binary search within that bounded range

Time complexity: $O(\log i)$ , where $i$ is the position of the target element
Particularly useful when the target is near the beginning of a large or infinite sequence

Search in specialized structures

Different data structures call for different search strategies. Trees and graphs each have their own traversal and search algorithms.

Tree-based searches

Depth-first search (DFS) explores as deep as possible along each branch before backtracking. Uses a stack (or recursion).
Breadth-first search (BFS) explores all nodes at the current depth before moving deeper. Uses a queue.
Binary search trees let you search in $O(\log n)$ by comparing the target to each node and going left or right. Balanced variants like AVL trees and Red-Black trees guarantee this stays $O(\log n)$ even after insertions and deletions.
Tries store strings character by character, making prefix-based searches (like autocomplete) very fast.
Alpha-beta pruning reduces the search space in game trees by skipping branches that can't possibly affect the final decision.

Graph search algorithms

Dijkstra's algorithm finds the shortest path from a source node to all other nodes in a graph with non-negative edge weights
A* search extends Dijkstra's by adding a heuristic estimate of remaining distance, making it faster for pathfinding in games and robotics
Floyd-Warshall computes shortest paths between all pairs of nodes, useful for dense graphs where you need the full distance table
Topological sort orders nodes in a directed acyclic graph (DAG) so that every edge points forward, useful for dependency resolution
Strongly connected components algorithms (like Tarjan's) identify clusters of nodes where every node can reach every other node

Theoretical foundations

These concepts connect searching to the broader theory of computation and help you understand what's fundamentally possible and impossible.

Search problem complexity classes

Complexity classes group problems by how hard they are to solve.

P: problems solvable in polynomial time (e.g., sorting, searching a sorted array). These are considered "efficiently solvable."
NP: problems where a proposed solution can be verified in polynomial time, even if finding the solution might be hard. Every problem in P is also in NP.
NP-complete: the hardest problems in NP. If you could solve any one of them in polynomial time, you could solve all NP problems in polynomial time. The Boolean satisfiability problem (SAT) was the first proven NP-complete problem.

When you encounter an NP-complete problem in practice, that's a signal to look for approximate or heuristic solutions rather than exact ones.

Lower bounds for searching

Lower bounds tell you the best any algorithm can possibly do for a given problem, no matter how clever it is.

For comparison-based searching in a sorted array, the lower bound is $\Omega(\log n)$ . Binary search achieves this, so it's optimal among comparison-based methods.
For searching an unsorted array with no additional information, the lower bound is $\Omega(n)$ . You might have to check every element.
These bounds are proven using decision tree models (every comparison is a branch, and the tree height gives the minimum comparisons) and adversary arguments (an adversary forces the algorithm to do maximum work).

Understanding lower bounds tells you when to stop trying to find a faster algorithm and when there's still room for improvement.