The Continuous Wavelet Transform (CWT) decomposes a signal into scaled and shifted versions of a single prototype function called the mother wavelet. This gives you a time-frequency representation that adapts its resolution depending on the frequency being analyzed, making CWT especially well-suited for non-stationary signals whose frequency content changes over time.

Comparison vs. Discrete Wavelet Transform

CWT and the Discrete Wavelet Transform (DWT) differ primarily in how they sample the scale-translation plane:

CWT computes coefficients at every scale and translation, producing a continuous (and highly redundant) map of time-frequency content.
DWT restricts itself to dyadic scales ( $a = 2^j$ ) and corresponding translations, yielding a compact, non-redundant representation.

The practical trade-off: CWT gives you finer detail and more flexibility in choosing analysis scales, but at significantly higher computational cost. DWT is efficient enough for real-time use and perfect reconstruction, but its rigid dyadic grid can miss features that fall between scales.

Mathematical Representation

The CWT of a signal $x(t)$ with respect to a mother wavelet $\psi(t)$ is defined as:

$CWT_x(a,b) = \frac{1}{\sqrt{|a|}} \int_{-\infty}^{\infty} x(t) \, \psi^* \left(\frac{t-b}{a}\right) dt$

where:

$a$ is the scale parameter — it dilates ( $|a|>1$ ) or compresses ( $|a|<1$ ) the wavelet. Larger $|a|$ stretches the wavelet to capture lower-frequency content.
$b$ is the translation parameter — it slides the wavelet along the time axis to localize features.
$\psi^*(t)$ is the complex conjugate of the mother wavelet.
The $1/\sqrt{|a|}$ prefactor normalizes energy so that the wavelet has unit energy at every scale.

Mother Wavelet Function

The mother wavelet $\psi(t)$ is the prototype from which all analysis wavelets are generated through scaling and translation. Choosing the right mother wavelet matters because it determines what kinds of signal features the transform is most sensitive to.

Common choices include:

Morlet wavelet — a complex sinusoid modulated by a Gaussian envelope. Excellent for time-frequency analysis because it provides good joint localization and yields complex-valued coefficients (giving you both amplitude and phase).
Mexican hat (Ricker) wavelet — the negative normalized second derivative of a Gaussian. Real-valued and well-suited for detecting sharp transients and singularities.
Paul wavelet — complex-valued with good time localization, often used in geophysics.

The mother wavelet must satisfy the admissibility condition:

$C_\psi = \int_0^{\infty} \frac{|\hat{\psi}(\omega)|^2}{\omega} \, d\omega < \infty$

where $\hat{\psi}(\omega)$ is the Fourier transform of $\psi(t)$ . This condition requires that $\hat{\psi}(0) = 0$ , meaning the wavelet has zero mean. It guarantees that the inverse CWT exists and that the transform is energy-preserving.

Properties of CWT

Linearity

CWT is a linear transform. If $x(t) = \alpha_1 x_1(t) + \alpha_2 x_2(t)$ , then:

$CWT_x(a,b) = \alpha_1 \, CWT_{x_1}(a,b) + \alpha_2 \, CWT_{x_2}(a,b)$

This means you can analyze complex signals by decomposing them into simpler components, transforming each one separately, and summing the results. Linearity also simplifies theoretical analysis of system responses.

Time-Frequency Resolution

CWT provides simultaneous time and frequency information, but the Heisenberg uncertainty principle constrains the resolution you can achieve in both dimensions at once. The product of time resolution $\Delta t$ and frequency resolution $\Delta f$ has a lower bound:

$\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$

What makes CWT distinctive is how it distributes this uncertainty across scales:

Small scales (high frequencies): the wavelet is narrow in time, giving good time resolution but coarser frequency resolution.
Large scales (low frequencies): the wavelet is wide in time, giving good frequency resolution but coarser time resolution.

This adaptive tiling of the time-frequency plane is the core advantage of wavelet analysis over the fixed-resolution windowed Fourier transform (STFT).

Redundancy

CWT is an overcomplete representation. The number of coefficients it produces far exceeds the number of samples in the original signal, because you're computing inner products at a continuum of scales and translations.

This redundancy is not purely a drawback. It makes the representation more robust for tasks like denoising (small perturbations in the signal produce small, spread-out changes in the coefficients) and feature extraction (features show up consistently across neighboring coefficients). The cost is increased storage and computation.

Inverse CWT

You can reconstruct the original signal from its CWT coefficients using the inverse transform:

$x(t) = \frac{1}{C_\psi} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} CWT_x(a,b) \, \frac{1}{a^2} \, \psi\left(\frac{t-b}{a}\right) da \, db$

Here $C_\psi$ is the admissibility constant defined earlier. The double integral runs over all scales and translations, effectively summing up contributions from every wavelet atom.

Because CWT is overcomplete, reconstruction is not unique in general. The formula above gives the minimum-norm reconstruction. In practice, the redundancy means you have some freedom in how you reconstruct, which can be exploited for tasks like modified signal resynthesis.

Computation of CWT

Convolution

At each scale $a$ , the CWT reduces to a convolution between the signal and the time-reversed, conjugated wavelet at that scale:

$CWT_x(a,b) = \int_{-\infty}^{\infty} x(t) \, \psi_{a,b}^*(t) \, dt$

where $\psi_{a,b}(t) = \frac{1}{\sqrt{|a|}} \psi\left(\frac{t-b}{a}\right)$ .

Since this is a convolution in $b$ for fixed $a$ , you can compute it efficiently in the Fourier domain using the convolution theorem:

Compute the FFT of the signal $x(t)$ .
Compute the FFT of the scaled wavelet $\psi_a(t)$ (or obtain it analytically if the wavelet has a closed-form spectrum).
Multiply the two spectra (with appropriate conjugation).
Take the inverse FFT to get the CWT coefficients at scale $a$ .

This approach reduces the cost per scale from $O(N^2)$ to $O(N \log N)$ .

Scaling

The scale parameter $a$ controls the wavelet's time-frequency localization:

Large $a$ : the wavelet stretches out, becoming sensitive to low-frequency, long-duration features.
Small $a$ : the wavelet compresses, becoming sensitive to high-frequency, short-duration features.

The relationship between scale and pseudo-frequency is $f_a = f_c / a$ , where $f_c$ is the center frequency of the mother wavelet. Scales are often sampled logarithmically (e.g., voices per octave) to give uniform frequency resolution on a log-frequency axis.

Comparison vs discrete wavelet transform, Frontiers | Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis ...

Shifting

The translation parameter $b$ slides the wavelet along the time axis. At each position, the inner product between the signal and the wavelet measures how well the local signal content matches the wavelet's shape and frequency at that scale. You step through all values of $b$ to build up the full time-scale map.

Discretization for Practical Implementation

In any real implementation, both $a$ and $b$ must be sampled on a discrete grid. Common strategies:

Logarithmic scale sampling: choose $a_j = a_0 \cdot 2^{j/V}$ for $j = 0, 1, \ldots$ , where $V$ is the number of voices per octave. Higher $V$ gives finer scale resolution.
Uniform time sampling: sample $b$ at the signal's sampling interval $\Delta t$ .

Efficient algorithms for discretized CWT include:

FFT-based method (described above) — compute one FFT of the signal, then multiply by each scaled wavelet spectrum.
À trous algorithm — an undecimated filter bank approach that avoids explicit interpolation between scales.

The choice of discretization grid affects both computational cost and how faithfully the discrete coefficients approximate the true continuous transform.

Interpretation of CWT

Time-Frequency Representation

Each CWT coefficient $CWT_x(a,b)$ quantifies the correlation between the signal and the wavelet centered at time $b$ and scale $a$ . A large magnitude means the signal has strong content matching the wavelet's frequency and shape at that time location.

The time-frequency plane is tiled with cells whose aspect ratio varies with scale: narrow in time and wide in frequency at small scales, wide in time and narrow in frequency at large scales. This is fundamentally different from the STFT, which uses fixed-size cells everywhere.

Scalogram

The scalogram is the standard visualization of CWT results. It plots $|CWT_x(a,b)|^2$ (or sometimes $|CWT_x(a,b)|$ ) as a 2D image with time on the horizontal axis and scale (or equivalent frequency) on the vertical axis. Color intensity represents coefficient magnitude.

Reading a scalogram:

Bright horizontal bands indicate sustained oscillatory components at a particular frequency.
Bright vertical structures indicate transient events (impulses, edges) that excite many scales simultaneously.
Localized bright spots indicate short-lived oscillatory bursts.

The vertical axis is often displayed with low frequencies at the top (large scales) and high frequencies at the bottom (small scales), though conventions vary.

Energy Density

The squared modulus of the CWT coefficients gives the wavelet energy density:

$E(a,b) = |CWT_x(a,b)|^2$

This tells you how the signal's energy is distributed across time and scale. Integrating over both dimensions (with the appropriate $1/a^2$ weighting from Parseval's relation for wavelets) recovers the total signal energy, up to a constant depending on $C_\psi$ .

Identifying Signal Features

CWT is particularly effective for detecting and characterizing:

Transients and singularities — these produce large coefficients across many scales at a localized time, appearing as vertical ridges in the scalogram.
Frequency modulations — a chirp signal, for instance, traces a curved ridge in the time-scale plane.
Discontinuities — edges or jumps in the signal produce wavelet coefficient maxima whose decay rate across scales reveals the regularity (Lipschitz exponent) of the singularity.

Ridge extraction is a key technique: you trace the paths of local maxima across scales to isolate individual signal components. The wavelet skeleton (the set of all modulus maxima lines across scales) provides a sparse but highly informative representation of the signal's structure.

Applications of CWT

Time-Frequency Analysis

CWT is widely used for analyzing signals whose spectral content evolves over time. Examples include speech signals (where formant frequencies shift during phoneme transitions), EEG/ECG recordings (where rhythmic activity appears in bursts), and seismic traces (where different wave arrivals have distinct frequency signatures). The adaptive resolution of CWT makes it preferable to fixed-window STFT for these signals.

Signal Denoising

CWT-based denoising follows a straightforward pipeline:

Compute the CWT of the noisy signal.
Apply a thresholding rule to the wavelet coefficients. Hard thresholding sets coefficients below a threshold to zero. Soft thresholding shrinks all coefficients toward zero by the threshold amount.
Reconstruct the signal using the inverse CWT.

The underlying assumption is that signal energy concentrates in a few large coefficients, while noise spreads across many small ones. The redundancy of CWT actually helps here, since it provides a more stable estimate of which coefficients carry signal versus noise.

Feature Extraction

CWT coefficients serve as a rich feature set for downstream analysis. Useful features include:

Scale-wise energy: total energy at each scale, capturing the signal's spectral profile.
Statistical moments: mean, variance, skewness, and kurtosis of coefficients at each scale.
Ridge-based features: instantaneous frequency and amplitude extracted from scalogram ridges.

These features have proven effective in fault diagnosis (vibration analysis of rotating machinery), speech and audio classification, and biomedical signal characterization.

Comparison vs discrete wavelet transform, 图像压缩——小波变换（Wavelet Transform）从连续小波变换谈到离散小波变换_离散小波变换和连续小波变换_ChuanjieZhu的博客-程序员秘密 - 程序员秘密

Pattern Recognition

For classification tasks, CWT acts as a powerful front-end that converts raw 1D signals into 2D time-scale images. These scalogram images can then be fed into standard classifiers:

Traditional classifiers like SVMs or random forests operating on extracted wavelet features.
Convolutional neural networks (CNNs) operating directly on scalogram images, treating them as a form of spectrogram.

The multi-resolution nature of CWT produces features that are naturally robust to variations in signal scale and timing, which is valuable when the same pattern can appear at different speeds or time offsets.

Advantages of CWT

Flexibility in Choosing Mother Wavelet

You can select a mother wavelet that matches the morphology of the features you're looking for. Analyzing oscillatory signals? The complex Morlet wavelet gives you amplitude and phase. Looking for sharp transients? The Mexican hat wavelet's compact support and symmetry make it a better fit. This adaptability is a significant advantage over fixed-basis methods like the STFT.

Ability to Analyze Non-Stationary Signals

Fourier-based methods assume (at least locally) that the signal is stationary. CWT makes no such assumption. Because it localizes analysis in both time and frequency simultaneously, it can track frequency components as they appear, disappear, or drift. This is why CWT dominates in fields like seismology, neuroscience, and mechanical vibration analysis, where stationarity is the exception rather than the rule.

Multi-Resolution Analysis

CWT automatically provides coarse resolution at low frequencies and fine resolution at high frequencies. You don't need to choose a single analysis window size as you do with the STFT. This means a single CWT computation can reveal both slow trends and fast transients in the same signal, without any parameter tuning beyond the choice of mother wavelet and scale range.

Robustness to Noise

Because noise energy typically spreads across all scales while signal energy concentrates at specific scales, CWT provides a natural basis for separating the two. Thresholding in the wavelet domain is more effective than in the Fourier domain for signals with localized features, since the wavelet coefficients of those features remain large and concentrated even in the presence of noise.

Limitations of CWT

Computational Complexity

CWT is substantially more expensive than DWT or FFT. Even with the FFT-based convolution approach, you still need $O(N_s \cdot N \log N)$ operations, where $N_s$ is the number of scales and $N$ is the signal length. For long signals analyzed over many scales, this can become prohibitive, especially in real-time applications. DWT, by contrast, achieves $O(N)$ complexity through its decimated filter bank structure.

Redundancy of Coefficients

The same redundancy that helps with denoising also means CWT produces far more data than the original signal. For a signal of $N$ samples analyzed at $N_s$ scales, you get $N_s \times N$ coefficients. This increases memory requirements and can complicate subsequent processing steps that need to operate on the full coefficient set.

Boundary Effects

When the wavelet extends beyond the edges of the signal, the CWT coefficients near the boundaries become unreliable. This is especially problematic at large scales, where the wavelet is wide and a significant portion of it may hang off the signal's edge. Common mitigation strategies include:

Zero-padding: simple but introduces artificial discontinuities.
Symmetric extension: reflects the signal at the boundaries, reducing edge artifacts.
Boundary wavelets: specially constructed wavelets that adapt to the signal's finite support.

None of these solutions is perfect, so you should always treat CWT coefficients near the signal boundaries with caution.

Lack of Orthogonality

CWT wavelets at different scales and translations are not orthogonal to each other. This means the coefficients are correlated, and the representation contains redundant information. Consequences include:

Signal reconstruction is not as straightforward as with orthogonal DWT bases. You need to work within a frame framework, where reconstruction relies on the frame bounds rather than simple coefficient inversion.
Energy partitioning across scales is not clean: the energy in one coefficient partially overlaps with energy in neighboring coefficients.
Iterative or frame-based reconstruction algorithms may be needed for accurate inversion, adding computational overhead.