Fiveable

📡Advanced Signal Processing Unit 8 Review

QR code for Advanced Signal Processing practice questions

8.5 Sparse recovery algorithms

8.5 Sparse recovery algorithms

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📡Advanced Signal Processing
Unit & Topic Study Guides

Sparse signal models

Sparse recovery algorithms reconstruct signals from far fewer measurements than traditional sampling requires. They work because many real-world signals are sparse: only a small number of coefficients carry meaningful information when the signal is expressed in the right basis. This section covers the core signal models, the compressed sensing framework, the major algorithm families, and their theoretical guarantees.

Sparsity in signal processing

A signal is sparse if most of its coefficients are zero (or near-zero) when represented in some basis or dictionary. For example, a natural image may have millions of pixels, but its wavelet coefficients are dominated by just a few large values. Audio signals behave similarly in the Fourier domain.

This property is what makes efficient compression, denoising, and reconstruction possible. If you know a signal is kk-sparse (only kk out of NN coefficients are nonzero), you don't need all NN measurements to reconstruct it.

Sparse representation of signals

Sparse representation means expressing a signal xx as a linear combination of a few atoms from a dictionary DD:

x=Dαx = D\alpha

where α\alpha has only a few nonzero entries. The dictionary can be:

  • Fixed: Fourier, wavelet, DCT (chosen based on known signal properties)
  • Learned: adapted from training data (K-SVD, online dictionary learning)

Finding the sparsest α\alpha is the core computational challenge. Algorithms like matching pursuit and basis pursuit solve different relaxations of this problem.

Compressed sensing framework

Compressed sensing (CS) acquires and reconstructs sparse signals from far fewer linear measurements than the Nyquist rate demands. It combines sensing and compression into a single step.

Signal acquisition and compression

In CS, you don't sample the signal directly. Instead, you collect mm linear measurements:

y=Ax+ny = Ax + n

  • yRmy \in \mathbb{R}^m: measurement vector
  • ARm×NA \in \mathbb{R}^{m \times N}: measurement matrix (mNm \ll N)
  • xRNx \in \mathbb{R}^N: the sparse signal you want to recover
  • nn: measurement noise

Because mNm \ll N, this system is heavily underdetermined. Without the sparsity assumption, recovery would be impossible. With it, you can recover xx exactly (or near-exactly in the noisy case) using the algorithms described below.

Measurement matrix design

Not just any AA will work. The measurement matrix must preserve enough information about sparse signals to allow reconstruction. Two key properties matter:

  • Incoherence with the sparsifying basis: the rows of AA should be "spread out" relative to the sparsity basis so that each measurement captures global information about the signal.
  • Restricted Isometry Property (RIP): described in detail below.

Random matrices (Gaussian i.i.d., Bernoulli ±1\pm 1, subsampled Fourier) satisfy these properties with high probability when m=O(klog(N/k))m = O(k \log(N/k)), which is why they're widely used.

Restricted isometry property (RIP)

The RIP is the central sufficient condition for stable recovery. A matrix AA satisfies the RIP of order kk with constant δk(0,1)\delta_k \in (0,1) if, for every kk-sparse vector xx:

(1δk)x22Ax22(1+δk)x22(1 - \delta_k) \|x\|_2^2 \leq \|Ax\|_2^2 \leq (1 + \delta_k) \|x\|_2^2

What this means geometrically: AA approximately preserves the 2\ell_2-norm of every kk-sparse vector. Distances between sparse signals are nearly maintained after measurement, so you can "invert" the process without catastrophic information loss. A smaller δk\delta_k means better preservation and stronger recovery guarantees.

Verifying RIP for a specific matrix is NP-hard in general, which is why probabilistic guarantees for random matrices are so important in practice.

Convex optimization algorithms

These methods reformulate sparse recovery as a convex program, replacing the intractable 0\ell_0-minimization (counting nonzeros) with a tractable 1\ell_1-minimization (sum of absolute values). The 1\ell_1-norm is the tightest convex relaxation of 0\ell_0, and under RIP conditions it recovers the same solution.

Basis pursuit (BP)

Basis pursuit solves the noiseless recovery problem:

minxx1subject toAx=y\min_x \|x\|_1 \quad \text{subject to} \quad Ax = y

This is a linear program and can be solved with standard LP solvers (simplex, interior point). Under sufficient RIP conditions (roughly δ2k<21\delta_{2k} < \sqrt{2} - 1), BP recovers any kk-sparse signal exactly.

BP is the conceptual foundation for convex sparse recovery, but it assumes noiseless measurements. For noisy settings, you need the variants below.

Least absolute shrinkage and selection operator (LASSO)

LASSO handles noise by balancing data fidelity against sparsity:

minx12Axy22+λx1\min_x \frac{1}{2} \|Ax - y\|_2^2 + \lambda \|x\|_1

  • The first term keeps the solution consistent with the measurements.
  • The second term (1\ell_1 penalty) promotes sparsity.
  • λ>0\lambda > 0 controls the trade-off: larger λ\lambda yields sparser solutions but may sacrifice fidelity.

Choosing λ\lambda well is critical. Common strategies include cross-validation and setting λ\lambda proportional to the noise level σ\sigma. LASSO can be solved efficiently with coordinate descent, ADMM, or proximal gradient methods (ISTA/FISTA).

Dantzig selector

The Dantzig selector takes a different approach to noise robustness:

minxx1subject toAT(Axy)ϵ\min_x \|x\|_1 \quad \text{subject to} \quad \|A^T(Ax - y)\|_\infty \leq \epsilon

The constraint bounds the maximum correlation between the residual and any column of AA. The parameter ϵ\epsilon is typically set based on the noise level. This is also a linear program, so it scales well.

Recovery guarantees for the Dantzig selector are comparable to LASSO. The choice between them often comes down to implementation convenience and the specific problem structure.

Greedy pursuit algorithms

Greedy algorithms build the sparse estimate incrementally, selecting one or more atoms per iteration based on correlation with the current residual. They're faster per iteration than convex methods and often competitive in recovery quality.

Orthogonal matching pursuit (OMP)

OMP is the most straightforward greedy algorithm. Here's how it works:

  1. Initialize the residual r0=yr^0 = y and the support set S0=S^0 = \emptyset.

  2. Select: Find the column of AA most correlated with the residual: i=argmaxjrk1,aji = \arg\max_j |\langle r^{k-1}, a_j \rangle|.

  3. Update support: Add index ii to the support set SkS^k.

  4. Project: Solve the least-squares problem minxyASkx2\min_x \|y - A_{S^k} x\|_2 restricted to the current support.

  5. Update residual: rk=yASkx^r^k = y - A_{S^k} \hat{x}.

  6. Repeat until kk iterations (if sparsity is known) or until the residual is small enough.

OMP recovers kk-sparse signals exactly when AA satisfies a mutual coherence condition or a sufficiently strong RIP. Its main limitation: once an atom is selected, it's never removed, so an early wrong choice can't be corrected.

Compressive sampling matching pursuit (CoSaMP)

CoSaMP addresses OMP's limitations by adding a pruning step:

  1. Identify: Find the 2k2k columns of AA most correlated with the current residual.
  2. Merge: Combine these with the current support (up to 3k3k indices total).
  3. Estimate: Solve the least-squares problem on the merged support.
  4. Prune: Keep only the kk largest coefficients in the estimate.
  5. Update residual and repeat.

The pruning step means CoSaMP can correct mistakes from earlier iterations. It provides uniform recovery guarantees under RIP and is stable under noise, with reconstruction error bounded by the noise level.

Subspace pursuit (SP)

Subspace pursuit is similar in spirit to CoSaMP but with a tighter support management:

  1. Identify: Select the kk columns most correlated with the residual.
  2. Merge: Combine with the current kk-element support (2k2k total).
  3. Estimate: Least-squares on the merged support.
  4. Prune: Retain the kk largest coefficients.
  5. Update residual and repeat.

SP uses kk new candidates per iteration (vs. 2k2k for CoSaMP), giving it lower per-iteration cost. Recovery guarantees are similar to CoSaMP under comparable RIP conditions.

Iterative thresholding algorithms

These algorithms alternate between a gradient step (moving toward data consistency) and a thresholding step (enforcing sparsity). They're simple to implement and scale well to high-dimensional problems.

Sparsity in signal processing, Estimation of Number of Levels of Scaling the Principal Components in Denoising EEG Signals ...

Iterative hard thresholding (IHT)

IHT repeats a two-step update:

xk+1=Hk ⁣(xk+μAT(yAxk))x^{k+1} = H_k\!\left(x^k + \mu A^T(y - Ax^k)\right)

  • The term μAT(yAxk)\mu A^T(y - Ax^k) is a gradient descent step on the data fidelity yAx22\|y - Ax\|_2^2.
  • Hk()H_k(\cdot) is the hard thresholding operator: keep the kk largest-magnitude entries, set the rest to zero.
  • The step size μ\mu is typically set to 1 when AA has normalized columns, or tuned via line search.

IHT converges under RIP conditions (roughly δ3k<1/3\delta_{3k} < 1/\sqrt{3}). It's extremely simple but can be slow to converge compared to CoSaMP or SP.

Iterative soft thresholding (IST)

IST replaces hard thresholding with soft thresholding:

xk+1=Sλ ⁣(xk+μAT(yAxk))x^{k+1} = S_\lambda\!\left(x^k + \mu A^T(y - Ax^k)\right)

The soft thresholding operator Sλ(z)i=sign(zi)max(ziλ,0)S_\lambda(z)_i = \text{sign}(z_i) \max(|z_i| - \lambda, 0) shrinks all coefficients toward zero by λ\lambda, rather than setting small ones exactly to zero.

IST is equivalent to the proximal gradient method (ISTA) applied to the LASSO objective. Its accelerated variant, FISTA, achieves O(1/k2)O(1/k^2) convergence rate vs. O(1/k)O(1/k) for plain IST, and is widely used in practice.

Approximate message passing (AMP)

AMP adds a correction term (the "Onsager" term) to the standard iterative thresholding update:

xk+1=ηt ⁣(xk+ATrk),rk=yAxk+kmrk1ηt1x^{k+1} = \eta_t\!\left(x^k + A^T r^k\right), \quad r^k = y - Ax^k + \frac{k}{m} r^{k-1} \cdot \langle \eta'_{t-1} \rangle

The Onsager correction accounts for correlations introduced by the measurement matrix and dramatically improves convergence. For i.i.d. Gaussian AA, AMP's behavior is exactly characterized by a scalar recursion called state evolution, which predicts the MSE at each iteration.

AMP achieves Bayes-optimal performance for Gaussian matrices and can incorporate arbitrary denoising functions (not just thresholding) as the nonlinearity ηt\eta_t, leading to the "denoising-based AMP" (D-AMP) framework. However, AMP can diverge for non-Gaussian or structured measurement matrices; variants like VAMP (Vector AMP) address this.

Performance guarantees and analysis

Recovery conditions and bounds

Two main conditions guarantee recovery:

  • Null Space Property (NSP): AA satisfies NSP of order kk if no kk-sparse vector lies in the null space of AA. This is necessary and sufficient for 1\ell_1-minimization to recover all kk-sparse signals.
  • Restricted Isometry Property (RIP): A stronger, more practical condition (described above) that also yields quantitative error bounds.

Typical recovery bounds take the form:

x^x2C1σk(x)1k+C2ϵ\|\hat{x} - x\|_2 \leq C_1 \frac{\sigma_k(x)_1}{\sqrt{k}} + C_2 \epsilon

where σk(x)1\sigma_k(x)_1 is the 1\ell_1-norm of the tail (entries outside the kk largest) and ϵ\epsilon bounds the noise. The first term captures compressibility; the second captures noise sensitivity.

Noise and stability considerations

Real measurements are always noisy, so stability matters as much as exact recovery. The instance optimality or stable recovery property guarantees that reconstruction error degrades gracefully:

  • Error scales linearly with noise magnitude.
  • For approximately sparse (compressible) signals, error also scales with the best kk-term approximation error.

LASSO and the Dantzig selector are explicitly designed for this regime. Greedy algorithms like CoSaMP and SP also provide stable recovery bounds under RIP.

Phase transitions in sparse recovery

Phase transitions describe the sharp boundary between success and failure in the (k/N,m/N)(k/N, m/N) plane (sparsity ratio vs. measurement ratio).

  • Below the phase transition curve: recovery succeeds with high probability.
  • Above it: recovery fails.
  • The transition is remarkably sharp, especially for large NN.

For 1\ell_1-minimization with Gaussian matrices, the phase transition is precisely characterized by the Donoho-Tanner formula. For example, at sparsity ratio k/N=0.1k/N = 0.1, you need roughly m/N0.35m/N \approx 0.35 measurements for reliable recovery. These curves are essential for system design: they tell you the minimum number of measurements needed for a given sparsity level.

Applications of sparse recovery

Compressive imaging and sensing

CS principles enable imaging systems that acquire far fewer samples than conventional approaches:

  • Single-pixel camera: Uses a single photodetector with random spatial light modulation patterns, reconstructing the image via sparse recovery. Useful in spectral ranges where detector arrays are expensive (infrared, terahertz).
  • Compressive radar: Reduces the number of pulses or bandwidth needed for target detection.
  • MRI acceleration: Subsamples k-space (Fourier measurements) and reconstructs via sparsity in the wavelet domain, cutting scan times significantly.

Sparse channel estimation

Wireless channels are often sparse in the delay domain: only a few multipath components carry significant energy. By modeling the channel impulse response as a kk-sparse vector, you can estimate it from far fewer pilot symbols than conventional least-squares estimation requires.

This is particularly valuable in wideband systems (OFDM, mmWave) where the channel dimension is large but sparsity is high. Sparse estimation reduces pilot overhead, freeing bandwidth for data transmission.

Sparse signal denoising and inpainting

If a signal is sparse in some domain (wavelets for images, DCT for audio), sparse recovery algorithms can:

  • Denoise: Solve a LASSO-type problem where the "measurements" are the noisy observations. The 1\ell_1 penalty suppresses noise while preserving signal structure.
  • Inpaint: Treat missing samples as unmeasured entries. The known samples form the measurement vector, and sparse recovery fills in the gaps.

These techniques are standard in image restoration, audio enhancement, and video error concealment.

Variations and extensions

Structured sparsity models

Standard sparsity treats all support patterns as equally likely. In practice, nonzero coefficients often cluster in predictable ways:

  • Block/group sparsity: Nonzeros occur in contiguous blocks. Group LASSO penalizes the 2,1\ell_{2,1}-norm (sum of 2\ell_2-norms of groups) instead of the plain 1\ell_1-norm.
  • Tree sparsity: In wavelet decompositions, significant coefficients tend to follow parent-child relationships across scales.
  • Graph sparsity: Nonzero support follows the structure of an underlying graph.

Exploiting structure reduces the number of measurements needed for recovery, sometimes by a logarithmic factor compared to unstructured sparsity.

Dictionary learning for sparse representations

Fixed dictionaries (Fourier, wavelet) work well for generic signals but aren't optimal for specific signal classes. Dictionary learning adapts the dictionary DD to training data by alternating between:

  1. Sparse coding: Fix DD, find sparse coefficients for each training signal.
  2. Dictionary update: Fix coefficients, update DD to minimize reconstruction error.

K-SVD is the most widely used algorithm, updating one atom at a time via SVD. Online dictionary learning processes training samples sequentially and scales to large datasets. Learned dictionaries consistently outperform fixed ones for tasks like image denoising and classification.

Sparse recovery with prior information

When you know something about the signal beyond sparsity, you can tighten recovery:

  • Partially known support: If some nonzero locations are known a priori, modified BP or weighted 1\ell_1 minimization can exploit this, requiring fewer measurements.
  • Non-negativity: Many physical signals (spectra, concentrations) are non-negative. Adding this constraint improves recovery.
  • Bayesian compressed sensing: Places priors (e.g., spike-and-slab, Laplacian) on the signal coefficients and performs MAP or MMSE estimation. This framework naturally handles uncertainty quantification and hyperparameter learning.

Incorporating prior information consistently reduces the measurement burden and improves reconstruction quality, especially in low-SNR regimes.