Sparse recovery algorithms reconstruct signals from far fewer measurements than traditional sampling requires. They work because many real-world signals are sparse: only a small number of coefficients carry meaningful information when the signal is expressed in the right basis. This section covers the core signal models, the compressed sensing framework, the major algorithm families, and their theoretical guarantees.

Sparsity in signal processing

A signal is sparse if most of its coefficients are zero (or near-zero) when represented in some basis or dictionary. For example, a natural image may have millions of pixels, but its wavelet coefficients are dominated by just a few large values. Audio signals behave similarly in the Fourier domain.

This property is what makes efficient compression, denoising, and reconstruction possible. If you know a signal is $k$ -sparse (only $k$ out of $N$ coefficients are nonzero), you don't need all $N$ measurements to reconstruct it.

Sparse representation of signals

Sparse representation means expressing a signal $x$ as a linear combination of a few atoms from a dictionary $D$ :

$x = D\alpha$

where $\alpha$ has only a few nonzero entries. The dictionary can be:

Fixed: Fourier, wavelet, DCT (chosen based on known signal properties)
Learned: adapted from training data (K-SVD, online dictionary learning)

Finding the sparsest $\alpha$ is the core computational challenge. Algorithms like matching pursuit and basis pursuit solve different relaxations of this problem.

Compressed sensing framework

Compressed sensing (CS) acquires and reconstructs sparse signals from far fewer linear measurements than the Nyquist rate demands. It combines sensing and compression into a single step.

Signal acquisition and compression

In CS, you don't sample the signal directly. Instead, you collect $m$ linear measurements:

$y = Ax + n$

$y \in \mathbb{R}^m$ : measurement vector
$A \in \mathbb{R}^{m \times N}$ : measurement matrix ( $m \ll N$ )
$x \in \mathbb{R}^N$ : the sparse signal you want to recover
$n$ : measurement noise

Because $m \ll N$ , this system is heavily underdetermined. Without the sparsity assumption, recovery would be impossible. With it, you can recover $x$ exactly (or near-exactly in the noisy case) using the algorithms described below.

Measurement matrix design

Not just any $A$ will work. The measurement matrix must preserve enough information about sparse signals to allow reconstruction. Two key properties matter:

Incoherence with the sparsifying basis: the rows of $A$ should be "spread out" relative to the sparsity basis so that each measurement captures global information about the signal.
Restricted Isometry Property (RIP): described in detail below.

Random matrices (Gaussian i.i.d., Bernoulli $\pm 1$ , subsampled Fourier) satisfy these properties with high probability when $m = O(k \log(N/k))$ , which is why they're widely used.

Restricted isometry property (RIP)

The RIP is the central sufficient condition for stable recovery. A matrix $A$ satisfies the RIP of order $k$ with constant $\delta_k \in (0,1)$ if, for every $k$ -sparse vector $x$ :

$(1 - \delta_k) \|x\|_2^2 \leq \|Ax\|_2^2 \leq (1 + \delta_k) \|x\|_2^2$

What this means geometrically: $A$ approximately preserves the $\ell_2$ -norm of every $k$ -sparse vector. Distances between sparse signals are nearly maintained after measurement, so you can "invert" the process without catastrophic information loss. A smaller $\delta_k$ means better preservation and stronger recovery guarantees.

Verifying RIP for a specific matrix is NP-hard in general, which is why probabilistic guarantees for random matrices are so important in practice.

Convex optimization algorithms

These methods reformulate sparse recovery as a convex program, replacing the intractable $\ell_0$ -minimization (counting nonzeros) with a tractable $\ell_1$ -minimization (sum of absolute values). The $\ell_1$ -norm is the tightest convex relaxation of $\ell_0$ , and under RIP conditions it recovers the same solution.

Basis pursuit (BP)

Basis pursuit solves the noiseless recovery problem:

$\min_x \|x\|_1 \quad \text{subject to} \quad Ax = y$

This is a linear program and can be solved with standard LP solvers (simplex, interior point). Under sufficient RIP conditions (roughly $\delta_{2k} < \sqrt{2} - 1$ ), BP recovers any $k$ -sparse signal exactly.

BP is the conceptual foundation for convex sparse recovery, but it assumes noiseless measurements. For noisy settings, you need the variants below.

Least absolute shrinkage and selection operator (LASSO)

LASSO handles noise by balancing data fidelity against sparsity:

$\min_x \frac{1}{2} \|Ax - y\|_2^2 + \lambda \|x\|_1$

The first term keeps the solution consistent with the measurements.
The second term ( $\ell_1$ penalty) promotes sparsity.
$\lambda > 0$ controls the trade-off: larger $\lambda$ yields sparser solutions but may sacrifice fidelity.

Choosing $\lambda$ well is critical. Common strategies include cross-validation and setting $\lambda$ proportional to the noise level $\sigma$ . LASSO can be solved efficiently with coordinate descent, ADMM, or proximal gradient methods (ISTA/FISTA).

Dantzig selector

The Dantzig selector takes a different approach to noise robustness:

$\min_x \|x\|_1 \quad \text{subject to} \quad \|A^T(Ax - y)\|_\infty \leq \epsilon$

The constraint bounds the maximum correlation between the residual and any column of $A$ . The parameter $\epsilon$ is typically set based on the noise level. This is also a linear program, so it scales well.

Recovery guarantees for the Dantzig selector are comparable to LASSO. The choice between them often comes down to implementation convenience and the specific problem structure.

Greedy pursuit algorithms

Greedy algorithms build the sparse estimate incrementally, selecting one or more atoms per iteration based on correlation with the current residual. They're faster per iteration than convex methods and often competitive in recovery quality.

Orthogonal matching pursuit (OMP)

OMP is the most straightforward greedy algorithm. Here's how it works:

Initialize the residual $r^0 = y$ and the support set $S^0 = \emptyset$ .
Select: Find the column of $A$ most correlated with the residual: $i = \arg\max_j |\langle r^{k-1}, a_j \rangle|$ .
Update support: Add index $i$ to the support set $S^k$ .
Project: Solve the least-squares problem $\min_x \|y - A_{S^k} x\|_2$ restricted to the current support.
Update residual: $r^k = y - A_{S^k} \hat{x}$ .
Repeat until $k$ iterations (if sparsity is known) or until the residual is small enough.

OMP recovers $k$ -sparse signals exactly when $A$ satisfies a mutual coherence condition or a sufficiently strong RIP. Its main limitation: once an atom is selected, it's never removed, so an early wrong choice can't be corrected.

Compressive sampling matching pursuit (CoSaMP)

CoSaMP addresses OMP's limitations by adding a pruning step:

Identify: Find the $2k$ columns of $A$ most correlated with the current residual.
Merge: Combine these with the current support (up to $3k$ indices total).
Estimate: Solve the least-squares problem on the merged support.
Prune: Keep only the $k$ largest coefficients in the estimate.
Update residual and repeat.

The pruning step means CoSaMP can correct mistakes from earlier iterations. It provides uniform recovery guarantees under RIP and is stable under noise, with reconstruction error bounded by the noise level.

Subspace pursuit (SP)

Subspace pursuit is similar in spirit to CoSaMP but with a tighter support management:

Identify: Select the $k$ columns most correlated with the residual.
Merge: Combine with the current $k$ -element support ( $2k$ total).
Estimate: Least-squares on the merged support.
Prune: Retain the $k$ largest coefficients.
Update residual and repeat.

SP uses $k$ new candidates per iteration (vs. $2k$ for CoSaMP), giving it lower per-iteration cost. Recovery guarantees are similar to CoSaMP under comparable RIP conditions.

Iterative thresholding algorithms

These algorithms alternate between a gradient step (moving toward data consistency) and a thresholding step (enforcing sparsity). They're simple to implement and scale well to high-dimensional problems.

Sparsity in signal processing, Estimation of Number of Levels of Scaling the Principal Components in Denoising EEG Signals ...

Iterative hard thresholding (IHT)

IHT repeats a two-step update:

$x^{k+1} = H_k\!\left(x^k + \mu A^T(y - Ax^k)\right)$

The term $\mu A^T(y - Ax^k)$ is a gradient descent step on the data fidelity $\|y - Ax\|_2^2$ .
$H_k(\cdot)$ is the hard thresholding operator: keep the $k$ largest-magnitude entries, set the rest to zero.
The step size $\mu$ is typically set to 1 when $A$ has normalized columns, or tuned via line search.

IHT converges under RIP conditions (roughly $\delta_{3k} < 1/\sqrt{3}$ ). It's extremely simple but can be slow to converge compared to CoSaMP or SP.

Iterative soft thresholding (IST)

IST replaces hard thresholding with soft thresholding:

$x^{k+1} = S_\lambda\!\left(x^k + \mu A^T(y - Ax^k)\right)$

The soft thresholding operator $S_\lambda(z)_i = \text{sign}(z_i) \max(|z_i| - \lambda, 0)$ shrinks all coefficients toward zero by $\lambda$ , rather than setting small ones exactly to zero.

IST is equivalent to the proximal gradient method (ISTA) applied to the LASSO objective. Its accelerated variant, FISTA, achieves $O(1/k^2)$ convergence rate vs. $O(1/k)$ for plain IST, and is widely used in practice.

Approximate message passing (AMP)

AMP adds a correction term (the "Onsager" term) to the standard iterative thresholding update:

$x^{k+1} = \eta_t\!\left(x^k + A^T r^k\right), \quad r^k = y - Ax^k + \frac{k}{m} r^{k-1} \cdot \langle \eta'_{t-1} \rangle$

The Onsager correction accounts for correlations introduced by the measurement matrix and dramatically improves convergence. For i.i.d. Gaussian $A$ , AMP's behavior is exactly characterized by a scalar recursion called state evolution, which predicts the MSE at each iteration.

AMP achieves Bayes-optimal performance for Gaussian matrices and can incorporate arbitrary denoising functions (not just thresholding) as the nonlinearity $\eta_t$ , leading to the "denoising-based AMP" (D-AMP) framework. However, AMP can diverge for non-Gaussian or structured measurement matrices; variants like VAMP (Vector AMP) address this.

Performance guarantees and analysis

Recovery conditions and bounds

Two main conditions guarantee recovery:

Null Space Property (NSP): $A$ satisfies NSP of order $k$ if no $k$ -sparse vector lies in the null space of $A$ . This is necessary and sufficient for $\ell_1$ -minimization to recover all $k$ -sparse signals.
Restricted Isometry Property (RIP): A stronger, more practical condition (described above) that also yields quantitative error bounds.

Typical recovery bounds take the form:

$\|\hat{x} - x\|_2 \leq C_1 \frac{\sigma_k(x)_1}{\sqrt{k}} + C_2 \epsilon$

where $\sigma_k(x)_1$ is the $\ell_1$ -norm of the tail (entries outside the $k$ largest) and $\epsilon$ bounds the noise. The first term captures compressibility; the second captures noise sensitivity.

Noise and stability considerations

Real measurements are always noisy, so stability matters as much as exact recovery. The instance optimality or stable recovery property guarantees that reconstruction error degrades gracefully:

Error scales linearly with noise magnitude.
For approximately sparse (compressible) signals, error also scales with the best $k$ -term approximation error.

LASSO and the Dantzig selector are explicitly designed for this regime. Greedy algorithms like CoSaMP and SP also provide stable recovery bounds under RIP.

Phase transitions in sparse recovery

Phase transitions describe the sharp boundary between success and failure in the $(k/N, m/N)$ plane (sparsity ratio vs. measurement ratio).

Below the phase transition curve: recovery succeeds with high probability.
Above it: recovery fails.
The transition is remarkably sharp, especially for large $N$ .

For $\ell_1$ -minimization with Gaussian matrices, the phase transition is precisely characterized by the Donoho-Tanner formula. For example, at sparsity ratio $k/N = 0.1$ , you need roughly $m/N \approx 0.35$ measurements for reliable recovery. These curves are essential for system design: they tell you the minimum number of measurements needed for a given sparsity level.

Applications of sparse recovery

Compressive imaging and sensing

CS principles enable imaging systems that acquire far fewer samples than conventional approaches:

Single-pixel camera: Uses a single photodetector with random spatial light modulation patterns, reconstructing the image via sparse recovery. Useful in spectral ranges where detector arrays are expensive (infrared, terahertz).
Compressive radar: Reduces the number of pulses or bandwidth needed for target detection.
MRI acceleration: Subsamples k-space (Fourier measurements) and reconstructs via sparsity in the wavelet domain, cutting scan times significantly.

Sparse channel estimation

Wireless channels are often sparse in the delay domain: only a few multipath components carry significant energy. By modeling the channel impulse response as a $k$ -sparse vector, you can estimate it from far fewer pilot symbols than conventional least-squares estimation requires.

This is particularly valuable in wideband systems (OFDM, mmWave) where the channel dimension is large but sparsity is high. Sparse estimation reduces pilot overhead, freeing bandwidth for data transmission.

Sparse signal denoising and inpainting

If a signal is sparse in some domain (wavelets for images, DCT for audio), sparse recovery algorithms can:

Denoise: Solve a LASSO-type problem where the "measurements" are the noisy observations. The $\ell_1$ penalty suppresses noise while preserving signal structure.
Inpaint: Treat missing samples as unmeasured entries. The known samples form the measurement vector, and sparse recovery fills in the gaps.

These techniques are standard in image restoration, audio enhancement, and video error concealment.

Variations and extensions

Structured sparsity models

Standard sparsity treats all support patterns as equally likely. In practice, nonzero coefficients often cluster in predictable ways:

Block/group sparsity: Nonzeros occur in contiguous blocks. Group LASSO penalizes the $\ell_{2,1}$ -norm (sum of $\ell_2$ -norms of groups) instead of the plain $\ell_1$ -norm.
Tree sparsity: In wavelet decompositions, significant coefficients tend to follow parent-child relationships across scales.
Graph sparsity: Nonzero support follows the structure of an underlying graph.

Exploiting structure reduces the number of measurements needed for recovery, sometimes by a logarithmic factor compared to unstructured sparsity.

Dictionary learning for sparse representations

Fixed dictionaries (Fourier, wavelet) work well for generic signals but aren't optimal for specific signal classes. Dictionary learning adapts the dictionary $D$ to training data by alternating between:

Sparse coding: Fix $D$ , find sparse coefficients for each training signal.
Dictionary update: Fix coefficients, update $D$ to minimize reconstruction error.

K-SVD is the most widely used algorithm, updating one atom at a time via SVD. Online dictionary learning processes training samples sequentially and scales to large datasets. Learned dictionaries consistently outperform fixed ones for tasks like image denoising and classification.

Sparse recovery with prior information

When you know something about the signal beyond sparsity, you can tighten recovery:

Partially known support: If some nonzero locations are known a priori, modified BP or weighted $\ell_1$ minimization can exploit this, requiring fewer measurements.
Non-negativity: Many physical signals (spectra, concentrations) are non-negative. Adding this constraint improves recovery.
Bayesian compressed sensing: Places priors (e.g., spike-and-slab, Laplacian) on the signal coefficients and performs MAP or MMSE estimation. This framework naturally handles uncertainty quantification and hyperparameter learning.

Incorporating prior information consistently reduces the measurement burden and improves reconstruction quality, especially in low-SNR regimes.