Scalograms
Scalograms give you a visual map of how a signal's frequency content changes over time, using wavelet transforms instead of the fixed-window approach of spectrograms. They're the primary tool for time-scale analysis of non-stationary signals.
The core idea: a scalogram is a two-dimensional plot of wavelet coefficient magnitudes, with time on the x-axis and scale on the y-axis. Color or intensity at each point encodes how strongly the signal matches the wavelet at that particular time and scale. This gives you a localized view of the signal's energy distribution, which is exactly what you need for detecting transient events and tracking frequency shifts.
Scalogram vs Spectrogram
Both scalograms and spectrograms visualize time-varying frequency content, but they differ in a fundamental way:
- Spectrograms use the Short-Time Fourier Transform (STFT) with a fixed window size. Every frequency gets the same time-frequency resolution, which means you're always stuck with a single trade-off.
- Scalograms use wavelet transforms with variable window sizes. Short, compressed wavelets capture high frequencies with fine time resolution, while long, dilated wavelets capture low frequencies with fine frequency resolution.
This variable resolution is why scalograms outperform spectrograms for signals that contain both fast transients and slow-varying components.
Scalogram Interpretation
Reading a scalogram takes some practice because the axes aren't as intuitive as a spectrogram's:
- The horizontal axis is time, just like a spectrogram.
- The vertical axis is scale, which is inversely related to frequency. Low scale values sit at the top and correspond to high frequencies; high scale values sit at the bottom and correspond to low frequencies. (Some implementations flip this, so always check the axis labels.)
- Bright regions indicate high wavelet coefficient magnitudes, meaning the signal has strong energy at that time-scale combination.
- Vertical streaks suggest a transient event (energy across many scales at one time instant), while horizontal bands suggest a sustained oscillation at a particular scale.
The exact frequency associated with a given scale depends on the wavelet's center frequency, so converting the scale axis to a pseudo-frequency axis requires knowing which wavelet was used.
Wavelet Transforms for Scalograms
Wavelet transforms decompose a signal into scaled and translated copies of a mother wavelet, producing the coefficients that populate a scalogram. Unlike the Fourier transform's global sinusoidal basis, wavelets are localized in both time and frequency.
Continuous Wavelet Transform (CWT)
The CWT computes the inner product between the signal and scaled/translated versions of a wavelet function:
where:
- is the scale parameter (controls dilation/compression)
- is the translation parameter (controls position in time)
- is the mother wavelet
- denotes the complex conjugate
The factor normalizes the wavelet's energy across scales so that coefficients at different scales are directly comparable. Each coefficient quantifies how well the signal at time matches the wavelet at scale .
CWT vs Fourier Transform
| Property | Fourier Transform | CWT |
|---|---|---|
| Basis functions | Sinusoids (infinite extent) | Wavelets (localized) |
| Time information | None (global decomposition) | Preserved via translation |
| Frequency resolution | Uniform across all frequencies | Varies with scale |
| Window size | N/A (no windowing) | Adapts with scale |
The CWT's adaptive window is the key advantage: at small scales (high frequencies), the wavelet is narrow, giving precise time localization. At large scales (low frequencies), the wavelet is wide, giving precise frequency localization. This is a direct consequence of the Heisenberg uncertainty principle applied to time-frequency analysis.
Wavelet Functions for CWT
Your choice of mother wavelet shapes the scalogram's characteristics:
- Morlet wavelet: A complex sinusoid modulated by a Gaussian envelope. Excellent frequency resolution and the most common choice for time-frequency analysis. Produces smooth, interpretable scalograms.
- Mexican Hat wavelet (negative normalized second derivative of a Gaussian): Real-valued, good for detecting sharp transitions and singularities. Provides better time localization than the Morlet but coarser frequency resolution.
- Daubechies wavelets: Compact support, commonly used in the DWT context but can also be applied in CWT. Good for signals where you need exact reconstruction.
The trade-off is always between time localization and frequency localization. A wavelet with narrow time support resolves events well in time but smears frequency, and vice versa.
CWT Scalogram Generation
Generating a CWT scalogram follows these steps:
- Choose a mother wavelet appropriate for your signal and analysis goals.
- Define the range of scales you want to analyze (typically logarithmically spaced to give uniform resolution on a log-frequency axis).
- For each scale and each time position , compute .
- Take the squared magnitude (this gives the scalogram, sometimes called the wavelet power spectrum).
- Plot the result as a 2D image: time on the x-axis, scale on the y-axis, magnitude as color intensity.
Note that some references define the scalogram as rather than . The squared version emphasizes energy distribution and is more common in practice.
Time-Scale Representations
Time-scale representations analyze signals by varying the analysis window with scale rather than using a fixed frequency grid. This gives them fundamentally different resolution properties compared to time-frequency representations like the spectrogram.
Time-Scale vs Time-Frequency
A spectrogram's fixed window creates a uniform tiling of the time-frequency plane: every frequency bin has the same time resolution and the same frequency resolution. You pick one window size and live with the consequences everywhere.
A scalogram's variable window creates a non-uniform tiling. The time-frequency plane is divided into cells that are:
- Wide in frequency, narrow in time at high frequencies (small scales)
- Narrow in frequency, wide in time at low frequencies (large scales)
This matches how many real-world signals behave. High-frequency events (clicks, transients) tend to be short-lived and need good time resolution. Low-frequency events (drifts, oscillations) tend to persist and need good frequency resolution.
Scale Concept in Wavelets
Scale in wavelet analysis is analogous to zooming in and out:
- Small scale compressed wavelet captures high-frequency, short-duration features fine time resolution, coarse frequency resolution
- Large scale dilated wavelet captures low-frequency, long-duration features coarse time resolution, fine frequency resolution
Mathematically, scaling the wavelet by stretches it in time by a factor of and compresses its frequency content by .

Scale-to-Frequency Relationship
The pseudo-frequency corresponding to scale is:
where is the center frequency of the mother wavelet and is the sampling period. This inverse relationship means doubling the scale halves the corresponding frequency.
This conversion is essential for interpreting scalograms in physical units. Without it, the scale axis is abstract and hard to relate to the signal's actual spectral content.
Time-Scale Resolution
The resolution trade-off in time-scale analysis follows directly from the uncertainty principle:
- At low scales (high frequencies): time resolution is small, frequency resolution is large.
- At high scales (low frequencies): time resolution is large, frequency resolution is small.
- The product remains bounded from below at all scales.
This multi-resolution property is what makes scalograms so effective for signals containing features at multiple time scales simultaneously.
Multiresolution Analysis
Multiresolution analysis (MRA) provides the mathematical framework for decomposing a signal into approximations and details at progressively coarser scales. It's the theoretical backbone of the discrete wavelet transform.
Multiresolution Concept
The idea is to represent a signal as a hierarchy of increasingly coarse approximations, plus the detail lost at each step:
- Start with the original signal at the finest resolution.
- Split it into a smooth approximation (low-frequency content) and a detail component (high-frequency content).
- Take the approximation from step 2 and repeat: split it again into a coarser approximation and another detail component.
- Continue until you reach the desired depth.
The original signal equals the coarsest approximation plus all the detail components summed together. Nothing is lost in this decomposition.
Scaling Function
The scaling function acts as a low-pass filter that generates approximation coefficients. It satisfies the two-scale relation (also called the refinement equation):
where are the low-pass filter coefficients. This self-referential equation means can be built from scaled and shifted copies of itself, which is what makes the iterative decomposition possible.
Wavelet Function
The wavelet function acts as a high-pass filter that captures detail coefficients. It's derived from the scaling function via:
where are the high-pass filter coefficients. These are related to the low-pass coefficients by , forming a quadrature mirror filter pair. Together, and span the entire frequency axis without gaps or overlaps.
Multiresolution Decomposition
The decomposition proceeds as a filter bank:
- Pass the signal through the low-pass filter and downsample by 2 approximation coefficients at level 1.
- Pass the signal through the high-pass filter and downsample by 2 detail coefficients at level 1.
- Take the approximation coefficients from level 1, pass them through and again, downsample by 2 approximation and detail coefficients at level 2.
- Repeat to the desired depth .
At the end, you have one set of approximation coefficients at level and sets of detail coefficients (one per level). The downsampling by 2 at each stage is what makes this a dyadic decomposition and keeps the total number of coefficients equal to the original signal length.
Discrete Wavelet Transform (DWT)
The DWT implements multiresolution analysis computationally. It's the practical algorithm you actually run on discrete signals, and it's far more efficient than computing the CWT at every possible scale and translation.
DWT Definition
The DWT coefficients are defined as:
where is the scale index, is the translation index, and are discretized wavelets at dyadic scales and translations . The restriction to dyadic scales (powers of 2) is what distinguishes the DWT from the CWT and enables the efficient filter bank implementation.
DWT Implementation
The standard implementation uses Mallat's algorithm:
- Apply the low-pass filter to the input, then downsample by 2 approximation coefficients.
- Apply the high-pass filter to the input, then downsample by 2 detail coefficients.
- Feed the approximation coefficients back into step 1 for the next level.
Reconstruction reverses the process: upsample by 2, apply the synthesis filters, and sum. The computational cost is for an -point signal, compared to for the FFT, making the DWT very efficient.

Subband Coding with DWT
Subband coding uses the DWT's filter bank to split a signal into frequency subbands:
- Each level of decomposition produces one detail subband (a specific frequency range) and passes the remaining low-frequency content to the next level.
- Wavelets concentrate most of the signal's energy into a few large coefficients (energy compaction), which is why wavelet-based compression works well.
- Reconstruction involves upsampling each subband, applying synthesis filters, and summing all subbands together. Perfect reconstruction is guaranteed when the analysis and synthesis filters satisfy the appropriate conditions.
DWT Scalogram Interpretation
A DWT scalogram looks different from a CWT scalogram because of the dyadic sampling:
- Each decomposition level occupies a horizontal band in the scalogram, with coarser scales (lower frequencies) at the top and finer scales (higher frequencies) at the bottom (convention varies).
- Within each band, the time resolution differs: finer scales have more coefficients per unit time, giving better time localization.
- The scalogram appears as a set of rectangular blocks rather than the smooth surface of a CWT scalogram, reflecting the discrete nature of the transform.
Despite the coarser appearance, DWT scalograms are computationally efficient and sufficient for many practical applications like denoising and compression where exact time-frequency localization is less critical than fast processing.
Wavelet Packet Transform (WPT)
The WPT generalizes the DWT by decomposing both approximation and detail coefficients at each level, not just the approximation. This produces a richer, more flexible decomposition.
WPT vs DWT
In the DWT, only the low-frequency (approximation) branch gets further decomposed at each level. The high-frequency (detail) branches are left as-is. This creates an asymmetric tree that gives progressively finer frequency resolution only in the low-frequency region.
The WPT decomposes every branch, producing a full binary tree. This means:
- High-frequency regions get the same level of frequency subdivision as low-frequency regions.
- You can select any subset of nodes from the tree (a "best basis") to match the signal's actual frequency structure.
- The frequency resolution is uniform across the entire spectrum at any given tree depth.
WPT Decomposition Tree
The WPT produces a complete binary tree where:
- The root node contains the original signal.
- At each level, every node splits into two children via low-pass and high-pass filtering followed by downsampling.
- A tree of depth produces leaf nodes, each representing a frequency subband of equal bandwidth.
- The total number of coefficients across all leaf nodes equals the original signal length, preserving the information content.
You don't have to use all leaf nodes. Best-basis algorithms (like the one proposed by Coifman and Wickerhauser) select the combination of nodes that minimizes a cost function such as entropy, giving you an adaptive, signal-dependent decomposition.
WPT Scalogram Generation
To generate a WPT scalogram:
- Compute the full WPT decomposition tree to the desired depth.
- Select the set of subbands you want to display (all leaf nodes for a uniform decomposition, or a best-basis selection for an adaptive one).
- Plot the coefficient magnitudes with time on the x-axis and subband index (or corresponding frequency range) on the y-axis.
- Use color or intensity to represent magnitude.
The resulting scalogram has uniform frequency resolution across all subbands (if using all leaf nodes), unlike the DWT scalogram which has finer frequency resolution only at low frequencies.
WPT for Signal Analysis
The WPT's flexibility makes it particularly valuable when:
- The signal has important high-frequency structure that the DWT's asymmetric decomposition would miss.
- You need uniform frequency resolution across the entire spectrum.
- You want to adaptively choose the best time-frequency tiling for a specific signal, rather than being locked into the DWT's dyadic structure.
Applications include speech and audio analysis (where both low and high formant frequencies matter), vibration analysis for mechanical fault detection, and EEG signal classification where discriminative features may appear in narrow high-frequency bands.
Applications of Scalograms
Signal Denoising
Wavelet-based denoising exploits the fact that signal energy tends to concentrate in a few large wavelet coefficients, while noise spreads evenly across all coefficients. The procedure:
- Compute the wavelet transform (CWT, DWT, or WPT) of the noisy signal.
- Apply a thresholding rule to the coefficients. Hard thresholding sets coefficients below a threshold to zero. Soft thresholding shrinks all coefficients toward zero by the threshold amount.
- Reconstruct the signal from the modified coefficients.
The threshold is typically set based on the noise level, often using Donoho's universal threshold , where is the noise standard deviation and is the signal length. The scalogram helps you visualize which time-scale regions contain signal versus noise before choosing your thresholding strategy.
Feature Extraction
Scalograms encode a signal's time-scale structure in a 2D image, which makes them natural inputs for feature extraction:
- Statistical features: Compute energy, entropy, variance, or higher-order moments within specific time-scale regions of the scalogram.
- Texture features: Treat the scalogram as an image and extract texture descriptors.
- Subband energy ratios: Compare energy across different scale bands to characterize the signal's spectral shape over time.
These features can then feed into classifiers for tasks like heart arrhythmia detection from ECG signals, speaker identification from speech, or bearing fault diagnosis from vibration data.
Pattern Recognition
Scalograms have become a standard front-end for pattern recognition systems, especially with the rise of deep learning:
- Convert the signal to a scalogram image, then feed it into a convolutional neural network (CNN) for classification.
- The multi-resolution property means the CNN sees both fine temporal detail and broad spectral structure simultaneously.
- This approach has shown strong results in audio event detection, seismic signal classification, and biomedical signal analysis.
Time-Varying Signal Analysis
For signals whose frequency content evolves over time, scalograms reveal structure that neither the time domain nor the frequency domain can show alone:
- Transient detection: Short-lived events like clicks, spikes, or fault impulses appear as vertical features spanning multiple scales at a specific time instant.
- Frequency tracking: A component whose frequency drifts over time traces a curved path through the scalogram, making the drift visible and measurable.
- Regime changes: Abrupt shifts in the signal's spectral character show up as sudden changes in the scalogram's energy distribution, useful for segmenting non-stationary signals into quasi-stationary segments.