Sparse signal models
Sparse recovery algorithms reconstruct signals from far fewer measurements than traditional sampling requires. They work because many real-world signals are sparse: only a small number of coefficients carry meaningful information when the signal is expressed in the right basis. This section covers the core signal models, the compressed sensing framework, the major algorithm families, and their theoretical guarantees.
Sparsity in signal processing
A signal is sparse if most of its coefficients are zero (or near-zero) when represented in some basis or dictionary. For example, a natural image may have millions of pixels, but its wavelet coefficients are dominated by just a few large values. Audio signals behave similarly in the Fourier domain.
This property is what makes efficient compression, denoising, and reconstruction possible. If you know a signal is -sparse (only out of coefficients are nonzero), you don't need all measurements to reconstruct it.
Sparse representation of signals
Sparse representation means expressing a signal as a linear combination of a few atoms from a dictionary :
where has only a few nonzero entries. The dictionary can be:
- Fixed: Fourier, wavelet, DCT (chosen based on known signal properties)
- Learned: adapted from training data (K-SVD, online dictionary learning)
Finding the sparsest is the core computational challenge. Algorithms like matching pursuit and basis pursuit solve different relaxations of this problem.
Compressed sensing framework
Compressed sensing (CS) acquires and reconstructs sparse signals from far fewer linear measurements than the Nyquist rate demands. It combines sensing and compression into a single step.
Signal acquisition and compression
In CS, you don't sample the signal directly. Instead, you collect linear measurements:
- : measurement vector
- : measurement matrix ()
- : the sparse signal you want to recover
- : measurement noise
Because , this system is heavily underdetermined. Without the sparsity assumption, recovery would be impossible. With it, you can recover exactly (or near-exactly in the noisy case) using the algorithms described below.
Measurement matrix design
Not just any will work. The measurement matrix must preserve enough information about sparse signals to allow reconstruction. Two key properties matter:
- Incoherence with the sparsifying basis: the rows of should be "spread out" relative to the sparsity basis so that each measurement captures global information about the signal.
- Restricted Isometry Property (RIP): described in detail below.
Random matrices (Gaussian i.i.d., Bernoulli , subsampled Fourier) satisfy these properties with high probability when , which is why they're widely used.
Restricted isometry property (RIP)
The RIP is the central sufficient condition for stable recovery. A matrix satisfies the RIP of order with constant if, for every -sparse vector :
What this means geometrically: approximately preserves the -norm of every -sparse vector. Distances between sparse signals are nearly maintained after measurement, so you can "invert" the process without catastrophic information loss. A smaller means better preservation and stronger recovery guarantees.
Verifying RIP for a specific matrix is NP-hard in general, which is why probabilistic guarantees for random matrices are so important in practice.
Convex optimization algorithms
These methods reformulate sparse recovery as a convex program, replacing the intractable -minimization (counting nonzeros) with a tractable -minimization (sum of absolute values). The -norm is the tightest convex relaxation of , and under RIP conditions it recovers the same solution.
Basis pursuit (BP)
Basis pursuit solves the noiseless recovery problem:
This is a linear program and can be solved with standard LP solvers (simplex, interior point). Under sufficient RIP conditions (roughly ), BP recovers any -sparse signal exactly.
BP is the conceptual foundation for convex sparse recovery, but it assumes noiseless measurements. For noisy settings, you need the variants below.
Least absolute shrinkage and selection operator (LASSO)
LASSO handles noise by balancing data fidelity against sparsity:
- The first term keeps the solution consistent with the measurements.
- The second term ( penalty) promotes sparsity.
- controls the trade-off: larger yields sparser solutions but may sacrifice fidelity.
Choosing well is critical. Common strategies include cross-validation and setting proportional to the noise level . LASSO can be solved efficiently with coordinate descent, ADMM, or proximal gradient methods (ISTA/FISTA).
Dantzig selector
The Dantzig selector takes a different approach to noise robustness:
The constraint bounds the maximum correlation between the residual and any column of . The parameter is typically set based on the noise level. This is also a linear program, so it scales well.
Recovery guarantees for the Dantzig selector are comparable to LASSO. The choice between them often comes down to implementation convenience and the specific problem structure.
Greedy pursuit algorithms
Greedy algorithms build the sparse estimate incrementally, selecting one or more atoms per iteration based on correlation with the current residual. They're faster per iteration than convex methods and often competitive in recovery quality.
Orthogonal matching pursuit (OMP)
OMP is the most straightforward greedy algorithm. Here's how it works:
-
Initialize the residual and the support set .
-
Select: Find the column of most correlated with the residual: .
-
Update support: Add index to the support set .
-
Project: Solve the least-squares problem restricted to the current support.
-
Update residual: .
-
Repeat until iterations (if sparsity is known) or until the residual is small enough.
OMP recovers -sparse signals exactly when satisfies a mutual coherence condition or a sufficiently strong RIP. Its main limitation: once an atom is selected, it's never removed, so an early wrong choice can't be corrected.
Compressive sampling matching pursuit (CoSaMP)
CoSaMP addresses OMP's limitations by adding a pruning step:
- Identify: Find the columns of most correlated with the current residual.
- Merge: Combine these with the current support (up to indices total).
- Estimate: Solve the least-squares problem on the merged support.
- Prune: Keep only the largest coefficients in the estimate.
- Update residual and repeat.
The pruning step means CoSaMP can correct mistakes from earlier iterations. It provides uniform recovery guarantees under RIP and is stable under noise, with reconstruction error bounded by the noise level.
Subspace pursuit (SP)
Subspace pursuit is similar in spirit to CoSaMP but with a tighter support management:
- Identify: Select the columns most correlated with the residual.
- Merge: Combine with the current -element support ( total).
- Estimate: Least-squares on the merged support.
- Prune: Retain the largest coefficients.
- Update residual and repeat.
SP uses new candidates per iteration (vs. for CoSaMP), giving it lower per-iteration cost. Recovery guarantees are similar to CoSaMP under comparable RIP conditions.
Iterative thresholding algorithms
These algorithms alternate between a gradient step (moving toward data consistency) and a thresholding step (enforcing sparsity). They're simple to implement and scale well to high-dimensional problems.

Iterative hard thresholding (IHT)
IHT repeats a two-step update:
- The term is a gradient descent step on the data fidelity .
- is the hard thresholding operator: keep the largest-magnitude entries, set the rest to zero.
- The step size is typically set to 1 when has normalized columns, or tuned via line search.
IHT converges under RIP conditions (roughly ). It's extremely simple but can be slow to converge compared to CoSaMP or SP.
Iterative soft thresholding (IST)
IST replaces hard thresholding with soft thresholding:
The soft thresholding operator shrinks all coefficients toward zero by , rather than setting small ones exactly to zero.
IST is equivalent to the proximal gradient method (ISTA) applied to the LASSO objective. Its accelerated variant, FISTA, achieves convergence rate vs. for plain IST, and is widely used in practice.
Approximate message passing (AMP)
AMP adds a correction term (the "Onsager" term) to the standard iterative thresholding update:
The Onsager correction accounts for correlations introduced by the measurement matrix and dramatically improves convergence. For i.i.d. Gaussian , AMP's behavior is exactly characterized by a scalar recursion called state evolution, which predicts the MSE at each iteration.
AMP achieves Bayes-optimal performance for Gaussian matrices and can incorporate arbitrary denoising functions (not just thresholding) as the nonlinearity , leading to the "denoising-based AMP" (D-AMP) framework. However, AMP can diverge for non-Gaussian or structured measurement matrices; variants like VAMP (Vector AMP) address this.
Performance guarantees and analysis
Recovery conditions and bounds
Two main conditions guarantee recovery:
- Null Space Property (NSP): satisfies NSP of order if no -sparse vector lies in the null space of . This is necessary and sufficient for -minimization to recover all -sparse signals.
- Restricted Isometry Property (RIP): A stronger, more practical condition (described above) that also yields quantitative error bounds.
Typical recovery bounds take the form:
where is the -norm of the tail (entries outside the largest) and bounds the noise. The first term captures compressibility; the second captures noise sensitivity.
Noise and stability considerations
Real measurements are always noisy, so stability matters as much as exact recovery. The instance optimality or stable recovery property guarantees that reconstruction error degrades gracefully:
- Error scales linearly with noise magnitude.
- For approximately sparse (compressible) signals, error also scales with the best -term approximation error.
LASSO and the Dantzig selector are explicitly designed for this regime. Greedy algorithms like CoSaMP and SP also provide stable recovery bounds under RIP.
Phase transitions in sparse recovery
Phase transitions describe the sharp boundary between success and failure in the plane (sparsity ratio vs. measurement ratio).
- Below the phase transition curve: recovery succeeds with high probability.
- Above it: recovery fails.
- The transition is remarkably sharp, especially for large .
For -minimization with Gaussian matrices, the phase transition is precisely characterized by the Donoho-Tanner formula. For example, at sparsity ratio , you need roughly measurements for reliable recovery. These curves are essential for system design: they tell you the minimum number of measurements needed for a given sparsity level.
Applications of sparse recovery
Compressive imaging and sensing
CS principles enable imaging systems that acquire far fewer samples than conventional approaches:
- Single-pixel camera: Uses a single photodetector with random spatial light modulation patterns, reconstructing the image via sparse recovery. Useful in spectral ranges where detector arrays are expensive (infrared, terahertz).
- Compressive radar: Reduces the number of pulses or bandwidth needed for target detection.
- MRI acceleration: Subsamples k-space (Fourier measurements) and reconstructs via sparsity in the wavelet domain, cutting scan times significantly.
Sparse channel estimation
Wireless channels are often sparse in the delay domain: only a few multipath components carry significant energy. By modeling the channel impulse response as a -sparse vector, you can estimate it from far fewer pilot symbols than conventional least-squares estimation requires.
This is particularly valuable in wideband systems (OFDM, mmWave) where the channel dimension is large but sparsity is high. Sparse estimation reduces pilot overhead, freeing bandwidth for data transmission.
Sparse signal denoising and inpainting
If a signal is sparse in some domain (wavelets for images, DCT for audio), sparse recovery algorithms can:
- Denoise: Solve a LASSO-type problem where the "measurements" are the noisy observations. The penalty suppresses noise while preserving signal structure.
- Inpaint: Treat missing samples as unmeasured entries. The known samples form the measurement vector, and sparse recovery fills in the gaps.
These techniques are standard in image restoration, audio enhancement, and video error concealment.
Variations and extensions
Structured sparsity models
Standard sparsity treats all support patterns as equally likely. In practice, nonzero coefficients often cluster in predictable ways:
- Block/group sparsity: Nonzeros occur in contiguous blocks. Group LASSO penalizes the -norm (sum of -norms of groups) instead of the plain -norm.
- Tree sparsity: In wavelet decompositions, significant coefficients tend to follow parent-child relationships across scales.
- Graph sparsity: Nonzero support follows the structure of an underlying graph.
Exploiting structure reduces the number of measurements needed for recovery, sometimes by a logarithmic factor compared to unstructured sparsity.
Dictionary learning for sparse representations
Fixed dictionaries (Fourier, wavelet) work well for generic signals but aren't optimal for specific signal classes. Dictionary learning adapts the dictionary to training data by alternating between:
- Sparse coding: Fix , find sparse coefficients for each training signal.
- Dictionary update: Fix coefficients, update to minimize reconstruction error.
K-SVD is the most widely used algorithm, updating one atom at a time via SVD. Online dictionary learning processes training samples sequentially and scales to large datasets. Learned dictionaries consistently outperform fixed ones for tasks like image denoising and classification.
Sparse recovery with prior information
When you know something about the signal beyond sparsity, you can tighten recovery:
- Partially known support: If some nonzero locations are known a priori, modified BP or weighted minimization can exploit this, requiring fewer measurements.
- Non-negativity: Many physical signals (spectra, concentrations) are non-negative. Adding this constraint improves recovery.
- Bayesian compressed sensing: Places priors (e.g., spike-and-slab, Laplacian) on the signal coefficients and performs MAP or MMSE estimation. This framework naturally handles uncertainty quantification and hyperparameter learning.
Incorporating prior information consistently reduces the measurement burden and improves reconstruction quality, especially in low-SNR regimes.