Fiveable

📡Advanced Signal Processing Unit 6 Review

QR code for Advanced Signal Processing practice questions

6.4 Wavelet transform

6.4 Wavelet transform

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📡Advanced Signal Processing
Unit & Topic Study Guides

Wavelet Transform Basics

The wavelet transform decomposes a signal into a set of basis functions called wavelets that are localized in both time and frequency. This dual localization is what makes it so effective for non-stationary signals, where frequency content changes over time. Unlike the Fourier transform, which gives you global frequency information, the wavelet transform tells you what frequencies are present and when they occur.

Wavelet Definition and Properties

Wavelets are oscillatory, finite-duration functions with zero mean and varying frequency content. Three properties govern how well a wavelet performs for a given task:

  • Admissibility condition: The wavelet must satisfy Cψ=0ψ^(ω)2ωdω<C_\psi = \int_0^\infty \frac{|\hat{\psi}(\omega)|^2}{\omega} d\omega < \infty, which guarantees that the transform is invertible and the original signal can be perfectly reconstructed.
  • Regularity: Smoother wavelets produce better frequency localization and sparser signal representations. Regularity is tied to the number of continuous derivatives the wavelet possesses.
  • Vanishing moments: A wavelet with NN vanishing moments satisfies tkψ(t)dt=0\int t^k \psi(t) dt = 0 for k=0,1,,N1k = 0, 1, \ldots, N-1. More vanishing moments mean the wavelet is "blind" to polynomial trends of degree N1N-1, which improves compression and denoising performance.

Common wavelet choices include Haar, Daubechies, Symlets, and Coiflets, each offering different trade-offs among these properties.

Continuous vs. Discrete Wavelets

  • Continuous wavelets are defined over a continuous range of scales aa and translations bb, producing a highly redundant representation. This redundancy is useful for detailed signal analysis and feature extraction, but it comes at a computational cost.
  • Discrete wavelets sample the scale and translation parameters on a dyadic grid (a=2ja = 2^j, b=k2jb = k \cdot 2^j), forming an orthonormal basis for L2(R)L^2(\mathbb{R}). This eliminates redundancy, enables efficient computation via filter banks, and guarantees perfect reconstruction. Discrete wavelets are the standard choice for compression and denoising.

Wavelet Families and Types

Wavelet families group wavelets that share structural properties such as support size, symmetry, and vanishing moments.

  • Daubechies (dbN): Orthogonal wavelets with compact support and the maximum number of vanishing moments for a given support length. They are asymmetric, which can matter in some applications.
  • Biorthogonal: Use separate analysis and synthesis wavelet/scaling function pairs that form biorthogonal bases. This allows symmetric wavelets (useful for avoiding phase distortion) while still achieving perfect reconstruction.
  • Gaussian wavelets (e.g., Mexican hat): Derived from derivatives of the Gaussian function. They provide optimal time-frequency localization in the Heisenberg sense but lack compact support, so they are used primarily in the CWT rather than the DWT.

Wavelets can be further classified by orthogonality (orthogonal vs. biorthogonal), symmetry (symmetric vs. asymmetric), and regularity.

Continuous Wavelet Transform (CWT)

The CWT provides a continuous-time, multi-resolution representation of a signal by correlating it with scaled and shifted copies of a mother wavelet. It's the natural tool for exploratory time-frequency analysis of non-stationary signals.

CWT Definition and Formula

The CWT of a signal x(t)x(t) with respect to a mother wavelet ψ(t)\psi(t) is:

Wx(a,b)=1ax(t)ψ(tba)dtW_x(a,b) = \frac{1}{\sqrt{|a|}} \int_{-\infty}^{\infty} x(t) \, \psi^* \left(\frac{t-b}{a}\right) dt

  • aa is the scale parameter. It stretches or compresses the wavelet. Smaller aa compresses the wavelet (capturing higher frequencies); larger aa stretches it (capturing lower frequencies). Scale is inversely related to frequency: ffc/af \approx f_c / a, where fcf_c is the wavelet's center frequency.
  • bb is the translation parameter, sliding the wavelet along the time axis.
  • ψ(t)\psi^*(t) is the complex conjugate of the mother wavelet.
  • The 1/a1/\sqrt{|a|} factor normalizes the wavelet's energy across scales.

CWT Scalogram and Interpretation

The scalogram displays Wx(a,b)2|W_x(a,b)|^2 (the squared magnitude of the CWT coefficients) as a 2D map with translation on the horizontal axis and scale (or equivalent frequency) on the vertical axis.

Reading a scalogram:

  • Bright regions indicate high energy at that particular scale and time location.
  • Vertical structures point to transient events (e.g., impulses, edges) that are well-localized in time.
  • Horizontal structures indicate sustained oscillatory components at a particular frequency.
  • At fine scales (high frequency), you get good time resolution but poor frequency resolution. At coarse scales (low frequency), the reverse holds. This is the Heisenberg trade-off inherent to the CWT.

CWT vs. Fourier Transform

PropertyFourier TransformCWT
RepresentationGlobal frequency onlyLocal time-frequency
Stationarity assumptionAssumes stationarityNo stationarity assumption
Time resolutionNone (infinite window)Varies with scale
Frequency resolutionUniformVaries with scale
Best suited forStationary signals, spectral analysisNon-stationary signals, transient detection
The CWT's variable resolution is its defining advantage: narrow windows at high frequencies give precise time localization of fast events, while wide windows at low frequencies give precise frequency localization of slow oscillations.

Discrete Wavelet Transform (DWT)

The DWT provides a non-redundant, computationally efficient decomposition of a signal into approximation and detail coefficients at discrete dyadic scales. It's the workhorse behind practical applications like compression, denoising, and feature extraction.

Wavelet definition and properties, Haar wavelet - Wikipedia, the free encyclopedia

DWT Definition and Formula

The DWT is implemented through an iterated filter bank. At each decomposition level, the signal passes through a low-pass filter h[n]h[n] and a high-pass filter g[n]g[n], followed by downsampling by 2.

The decomposition formulas for level jj are:

aj[k]=naj1[n]h[2kn]a_j[k] = \sum_{n} a_{j-1}[n] \, h[2k - n]

dj[k]=naj1[n]g[2kn]d_j[k] = \sum_{n} a_{j-1}[n] \, g[2k - n]

where a0[n]=x[n]a_0[n] = x[n] is the original signal, aj[k]a_j[k] are the approximation coefficients (low-frequency content), and dj[k]d_j[k] are the detail coefficients (high-frequency content) at level jj.

Reconstruction recovers the signal by upsampling and filtering with synthesis filters h~[n]\tilde{h}[n] and g~[n]\tilde{g}[n]:

x[n]=kaJ[k]h~J[n2k]+j=1Jkdj[k]g~j[n2k]x[n] = \sum_{k} a_J[k] \, \tilde{h}_J[n - 2k] + \sum_{j=1}^{J} \sum_{k} d_j[k] \, \tilde{g}_j[n - 2k]

where JJ is the total number of decomposition levels. Perfect reconstruction requires that the analysis and synthesis filters satisfy specific algebraic conditions (orthogonality or biorthogonality).

Multiresolution Analysis with DWT

Multiresolution analysis (MRA) is the mathematical framework underpinning the DWT. It constructs a nested sequence of approximation subspaces {Vj}\{V_j\} such that:

{0}V2V1V0V1L2(R)\{0\} \subset \cdots \subset V_2 \subset V_1 \subset V_0 \subset V_{-1} \subset \cdots \subset L^2(\mathbb{R})

Two functions define the MRA:

  • Scaling function ϕ(t)\phi(t): Spans the approximation subspace VjV_j at scale jj. It acts as a low-pass function capturing coarse structure.
  • Wavelet function ψ(t)\psi(t): Spans the detail (or "wavelet") subspace WjW_j, which is the orthogonal complement of VjV_j in Vj1V_{j-1}. It captures the fine-scale information lost when moving from one resolution to the next.

The key MRA property is that Vj1=VjWjV_{j-1} = V_j \oplus W_j. The DWT implements this decomposition iteratively: at each level, it splits the current approximation space into a coarser approximation plus detail, producing a tree-like structure of coefficients.

DWT Decomposition and Reconstruction

Decomposition (analysis) proceeds level by level:

  1. Start with the original signal x[n]x[n] as the level-0 approximation.
  2. Filter with h[n]h[n] (low-pass) and downsample by 2 to get approximation coefficients a1[k]a_1[k].
  3. Filter with g[n]g[n] (high-pass) and downsample by 2 to get detail coefficients d1[k]d_1[k].
  4. Repeat steps 2-3 on a1[k]a_1[k] to get a2[k]a_2[k] and d2[k]d_2[k], and so on up to level JJ.

The result is one set of approximation coefficients aJa_J and JJ sets of detail coefficients {d1,d2,,dJ}\{d_1, d_2, \ldots, d_J\}.

Reconstruction (synthesis) reverses the process:

  1. Start from the coarsest level JJ.
  2. Upsample aJ[k]a_J[k] and dJ[k]d_J[k] by 2, filter with synthesis filters h~[n]\tilde{h}[n] and g~[n]\tilde{g}[n], and sum to recover aJ1[k]a_{J-1}[k].
  3. Repeat, combining each recovered approximation with the next finer detail level, until you reach the original signal.

Perfect reconstruction holds when the filter bank satisfies the alias cancellation and no-distortion conditions.

Wavelet Filter Banks and Coefficients

The filter bank is the computational engine of the DWT. Its properties directly determine the quality of the decomposition:

  • Orthogonality or biorthogonality: Ensures perfect reconstruction and energy preservation (Parseval's relation holds for orthogonal filter banks).
  • Finite impulse response (FIR): All practical wavelet filters are FIR, guaranteeing stability and (for symmetric filters) linear phase.
  • Vanishing moments: The number of vanishing moments of the wavelet equals the number of zeros the high-pass filter G(z)G(z) has at z=1z = 1. More zeros mean the filter better suppresses polynomial trends, yielding sparser detail coefficients.

The filter length determines the trade-off between time and frequency localization. Shorter filters (e.g., Haar with length 2) give sharp time localization but poor frequency selectivity. Longer filters (e.g., db10 with length 20) give smoother frequency partitioning at the cost of time smearing.

Wavelet Packet Transform (WPT)

The WPT generalizes the DWT by decomposing both approximation and detail coefficients at every level, rather than only the approximation branch. This produces a full binary tree of subbands, giving you much finer control over the time-frequency tiling.

WPT vs. DWT

In the standard DWT, only the low-frequency (approximation) branch is further decomposed at each level. This creates a logarithmic frequency partition: fine resolution at low frequencies, coarse resolution at high frequencies. That's great when most signal energy sits at low frequencies, but it's limiting otherwise.

The WPT decomposes every node, producing a complete binary tree. You can then choose which nodes to keep, effectively selecting an arbitrary tiling of the time-frequency plane. This makes the WPT far more adaptive:

  • For signals with significant high-frequency structure, the WPT can provide finer frequency resolution in those bands.
  • For classification tasks, you can select the decomposition that best separates signal classes.

The trade-off is increased computation and the need for a basis selection step.

WPT Decomposition Tree

The WPT decomposition tree is a complete binary tree:

  • The root node is the original signal.
  • At each level, every node splits into two children via low-pass and high-pass filtering plus downsampling.
  • Leaf nodes at depth JJ represent the finest frequency partition, with 2J2^J equal-width subbands.
  • Each node is indexed by (j,n)(j, n), where jj is the depth (scale) and nn is the frequency band index.

The full tree is highly redundant (it contains far more coefficients than the original signal). In practice, you prune the tree to select a non-redundant basis, which brings us to best basis selection.

Best Basis Selection in WPT

Best basis selection finds the optimal subtree (i.e., the set of non-overlapping nodes that covers the entire frequency axis) that minimizes a given cost function. This is what makes the WPT adaptive.

Common cost functions:

  • Shannon entropy: H=ici2logci2H = -\sum_i |c_i|^2 \log |c_i|^2. Minimizing entropy yields the sparsest representation.
  • Log-energy entropy: E=ilog(ci2)E = \sum_i \log(|c_i|^2). Concentrates energy into fewer coefficients, useful for compression.
  • Discriminant measures: Maximize class separability for pattern recognition tasks.

The Coifman-Wickerhauser algorithm solves this efficiently via dynamic programming:

  1. Compute the cost for every node in the full tree.
  2. Starting from the leaves, compare each parent's cost to the sum of its children's costs.
  3. If the parent's cost is lower (or equal), prune the children and keep the parent. Otherwise, keep the children.
  4. The surviving leaf nodes of the pruned tree form the best basis.

This runs in O(NlogN)O(N \log N) time, making it practical for real-time applications.

Wavelet definition and properties, Haar wavelet - Wikipedia, the free encyclopedia

Wavelet Thresholding and Denoising

Wavelet denoising rests on a simple but powerful observation: when you transform a noisy signal into the wavelet domain, the signal energy concentrates in a few large coefficients, while additive white noise spreads roughly uniformly across all coefficients. By zeroing out the small coefficients (which are mostly noise) and keeping the large ones (which are mostly signal), you can recover a clean estimate of the original signal.

Soft vs. Hard Thresholding

Given a threshold λ\lambda:

  • Hard thresholding keeps coefficients above the threshold unchanged and zeros out the rest:

x^H=x1(x>λ)\hat{x}_H = x \cdot \mathbf{1}(|x| > \lambda)

This preserves coefficient magnitudes but creates discontinuities at ±λ\pm\lambda, which can introduce Gibbs-like artifacts in the reconstructed signal.

  • Soft thresholding shrinks all coefficients toward zero by λ\lambda:

x^S=sign(x)max(0,xλ)\hat{x}_S = \text{sign}(x) \cdot \max(0, |x| - \lambda)

The shrinkage produces a continuous mapping, yielding smoother reconstructions. Soft thresholding also has better theoretical properties: Donoho and Johnstone showed it is near-minimax optimal for a broad class of function spaces.

In practice, soft thresholding is the default choice unless you have a specific reason to preserve exact coefficient magnitudes.

VisuShrink and SureShrink Methods

These are the two classical approaches to choosing the threshold λ\lambda:

VisuShrink (Donoho & Johnstone, 1994) uses a universal threshold:

λvisu=σ2logN\lambda_{\text{visu}} = \sigma \sqrt{2 \log N}

where σ\sigma is the noise standard deviation (often estimated from the finest-scale detail coefficients using the MAD estimator: σ^=median(d1)/0.6745\hat{\sigma} = \text{median}(|d_1|) / 0.6745) and NN is the signal length. This threshold is conservative: it's chosen so that, with high probability, all pure-noise coefficients fall below it. The downside is over-smoothing, since some signal coefficients also get killed.

SureShrink (Donoho & Johnstone, 1995) adapts the threshold to each subband by minimizing Stein's Unbiased Risk Estimate (SURE), which provides an unbiased estimate of the MSE without knowing the true signal. For each wavelet subband:

  1. Compute the SURE criterion as a function of λ\lambda.
  2. Choose the λ\lambda that minimizes SURE.
  3. If the subband's energy is very low (suggesting it's mostly noise), fall back to the universal VisuShrink threshold.

SureShrink generally outperforms VisuShrink because it adapts to the local signal-to-noise ratio in each subband.

Wavelet-Based Noise Reduction Applications

Wavelet denoising is widely used across domains:

  • Image denoising: Removing Gaussian, Poisson, or speckle noise while preserving edges and textures. Wavelet methods handle edge preservation well because edges produce large detail coefficients that survive thresholding.
  • Audio and speech enhancement: Suppressing background noise, hum, or recording artifacts from speech signals.
  • Biomedical signals: Cleaning ECG, EEG, and fMRI data to improve diagnostic accuracy. For example, wavelet denoising of ECG signals can remove baseline wander and muscle artifact while preserving the QRS complex morphology.
  • Seismic data processing: Enhancing signal-to-noise ratio in seismic traces for better subsurface imaging.
  • Financial time series: Separating trend and cyclical components from market noise for improved forecasting.

The common thread is that wavelet denoising works well whenever the signal of interest has a sparse wavelet representation and the noise does not.

Wavelet-Based Signal Compression

Wavelet compression exploits the same sparsity that makes denoising work. After transforming a signal into the wavelet domain, most of the energy is packed into a small number of large coefficients. You can discard or coarsely quantize the remaining small coefficients with minimal perceptual or reconstruction error, achieving high compression ratios.

Wavelet Transform in JPEG2000

JPEG2000 replaced the DCT-based JPEG with a DWT-based pipeline, and the improvement is substantial. The compression pipeline works as follows:

  1. Wavelet decomposition: The image undergoes a dyadic DWT, typically 5-6 levels deep. JPEG2000 uses the CDF 9/7 biorthogonal wavelet for lossy compression and the CDF 5/3 (Le Gall) wavelet for lossless compression.
  2. Quantization: Wavelet coefficients are scalar-quantized. For lossy compression, a uniform dead-zone quantizer reduces bit depth. For lossless mode, this step is skipped.
  3. Entropy coding: Quantized coefficients are encoded using EBCOT (Embedded Block Coding with Optimized Truncation), which applies context-adaptive binary arithmetic coding to each code-block independently.
  4. Rate-distortion optimization: The bitstream is organized so it can be truncated at any point to yield the best possible quality at that rate.

JPEG2000's advantages over JPEG include better compression at low bitrates (no blocking artifacts), quality and resolution scalability, and region-of-interest coding.

Embedded Zerotree Wavelet (EZW) Coding

EZW (Shapiro, 1993) was one of the first algorithms to exploit the cross-scale structure of wavelet coefficients for image compression.

The core insight is that if a wavelet coefficient at a coarse scale is insignificant (below threshold), its descendants at finer scales are very likely also insignificant. This parent-child relationship across scales forms a zerotree, which can be encoded with a single symbol instead of coding each zero individually.

EZW encoding steps:

  1. Compute the DWT of the image.
  2. Set the initial threshold T0=2log2(maxc)T_0 = 2^{\lfloor \log_2(\max |c|) \rfloor}.
  3. Dominant pass: Scan coefficients in a predefined order (typically low-frequency to high-frequency). For each coefficient, encode one of four symbols: positive significant, negative significant, isolated zero, or zerotree root.
  4. Subordinate pass: Refine previously significant coefficients by encoding the next most significant bit.
  5. Halve the threshold and repeat from step 3 until the target bitrate is reached.

The bitstream is naturally embedded: you can stop encoding at any point and still have a valid, progressively refined reconstruction.

Set Partitioning in Hierarchical Trees (SPIHT)

SPIHT (Said & Pearlman, 1996) builds on EZW's ideas but uses a more efficient set partitioning strategy that avoids explicit zerotree symbols.

SPIHT maintains three lists:

  • LIS (List of Insignificant Sets): Sets of coefficients grouped by spatial orientation trees that haven't yet been found significant.
  • LIP (List of Insignificant Pixels): Individual coefficients not yet significant.
  • LSP (List of Significant Pixels): Coefficients already identified as significant.

The algorithm proceeds bitplane by bitplane, from the most significant bit to the least:

  1. Sorting pass: Test each entry in LIP and LIS against the current threshold. When a set in LIS becomes significant, partition it into smaller subsets or individual coefficients. Move newly significant coefficients to LSP.
  2. Refinement pass: Output the next bit of each coefficient already in LSP.
  3. Halve the threshold and repeat.

SPIHT consistently outperforms EZW at the same bitrate and produces an embedded bitstream with excellent rate-distortion performance. It requires no explicit entropy coding (the output is already near-optimal), though adding arithmetic coding can squeeze out a few more tenths of a dB.