The Gabor transform is a time-frequency analysis technique that represents a signal in both the time and frequency domains simultaneously. It's designed for non-stationary signals, where the frequency content changes over time. Dennis Gabor introduced the concept in 1946 specifically to address the limitation of the Fourier transform, which tells you what frequencies are present but not when they occur.

Relationship to STFT

The Gabor transform is a specific instance of the Short-Time Fourier Transform (STFT). Both work by sliding a window along the signal and computing the Fourier transform of each windowed segment. The distinction is that the Gabor transform specifically uses a Gaussian window function, which achieves the theoretical minimum of the Heisenberg uncertainty bound for joint time-frequency localization. Any STFT with a Gaussian window is a Gabor transform.

Differences from wavelet transform

The Gabor transform uses a fixed window size for all frequencies, producing uniform time-frequency resolution across the entire plane. The wavelet transform, by contrast, uses variable-sized windows: narrow windows for high frequencies (good time resolution) and wide windows for low frequencies (good frequency resolution).

This makes wavelets better suited for signals with sharp transients at high frequencies and slowly varying low-frequency components. The Gabor transform is preferable when you want consistent resolution at all frequencies, or when the signal's characteristics don't vary dramatically across frequency bands.

Mathematical formulation

The Gabor transform maps a one-dimensional time-domain signal into a two-dimensional time-frequency representation by projecting the signal onto a set of time-frequency shifted Gaussian functions, called Gabor atoms.

Continuous Gabor transform

The continuous Gabor transform of a signal $x(t)$ is defined as:

$G_x(t,f) = \int_{-\infty}^{\infty} x(\tau) \, g^*(\tau - t) \, e^{-j2\pi f \tau} \, d\tau$

where:

$g(t)$ is the Gaussian window function
$t$ is the time shift parameter (centers the window)
$f$ is the frequency parameter
$^*$ denotes complex conjugation

You can read this as: slide the Gaussian window to time $t$ , multiply the signal by the conjugated window, then extract the frequency content at $f$ via the complex exponential.

Discrete Gabor transform

In practice, you compute the Gabor transform on discrete signals with discrete time-frequency shifts:

$G_x[m,n] = \sum_{k=0}^{N-1} x[k] \, g^*[k - mM] \, e^{-j2\pi nk/N}$

$x[k]$ : discrete input signal
$g[k]$ : discrete Gaussian window
$m$ , $n$ : discrete time and frequency shift indices
$M$ : time step between successive windows (hop size)
$N$ : total number of frequency bins (DFT length)

The product $M \times N$ relative to the signal length determines whether the transform is oversampled or critically sampled (more on this below).

Gabor coefficients

The Gabor coefficients $G_x[m,n]$ encode two things at each time-frequency location:

Magnitude $|G_x[m,n]|$ : the signal energy concentrated at that point in the time-frequency plane
Phase $\angle G_x[m,n]$ : local phase structure, which carries information about the fine temporal alignment of frequency components

Both are important. Magnitude is used most often for visualization and feature extraction, but phase is critical for reconstruction and for applications like phase vocoding in audio.

Gabor functions

Gabor functions (atoms) are the elementary building blocks of the transform. Each atom is a Gaussian envelope modulated by a complex sinusoid at a specific frequency and centered at a specific time.

Gaussian window function

The Gaussian window is defined as:

$g(t) = \frac{1}{\sqrt{2\pi}\,\sigma} \, e^{-t^2/(2\sigma^2)}$

The parameter $\sigma$ (standard deviation) controls the window width. A larger $\sigma$ spreads the window over more time, while a smaller $\sigma$ concentrates it. The Gaussian is chosen because it is the only window that achieves equality in the Heisenberg uncertainty bound.

Time-frequency resolution

The time-frequency resolution is governed entirely by $\sigma$ :

Narrow window (small $\sigma$ ): good time resolution, poor frequency resolution
Wide window (large $\sigma$ ): good frequency resolution, poor time resolution

Once you fix $\sigma$ , the resolution is the same everywhere in the time-frequency plane. This is a fundamental constraint you cannot escape, only manage through your choice of $\sigma$ for a given application.

Uncertainty principle

The Heisenberg-Gabor uncertainty principle states:

$\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$

where $\Delta t$ and $\Delta f$ are the effective time and frequency spreads of the window. No window function can beat this bound. The Gaussian window is special because it achieves equality, meaning it provides the tightest possible joint localization. This is precisely why Gabor chose it.

Properties of Gabor transform

Linearity

The Gabor transform is linear. For signals $x_1(t)$ and $x_2(t)$ with constants $a$ and $b$ :

$G_{ax_1 + bx_2}(t,f) = a\,G_{x_1}(t,f) + b\,G_{x_2}(t,f)$

This means you can analyze composite signals component by component and superpose the results. It also means the transform commutes with linear filtering operations, which simplifies many processing pipelines.

Invertibility

The Gabor transform is invertible under appropriate conditions. The inverse continuous Gabor transform is:

$x(t) = \frac{1}{C_g} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} G_x(\tau,f) \, g(\tau - t) \, e^{j2\pi f \tau} \, d\tau \, df$

The normalization constant $C_g$ depends on the window and must be finite (this is the admissibility condition). In the discrete case, invertibility depends on whether the Gabor atoms form a frame for the signal space. Oversampled systems are generally easier to invert stably than critically sampled ones.

Parseval's theorem

Energy is preserved under the Gabor transform:

$\int_{-\infty}^{\infty} |x(t)|^2 \, dt = \frac{1}{C_g} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} |G_x(t,f)|^2 \, dt \, df$

This guarantees that the total energy you measure in the time-frequency plane equals the energy of the original signal. It's essential for any energy-based analysis or when you modify coefficients and want to predict the effect on signal power.

Computation of Gabor transform

Relationship to STFT, image processing - Understanding this Gabor Filter equation - Signal Processing Stack Exchange

Efficient algorithms

Direct computation of the Gabor transform requires an inner product for every point on the time-frequency grid, which gets expensive fast. The standard efficient approach exploits the structure of the transform:

Segment the signal using the shifted Gaussian window at each time step $mM$
Apply the FFT to each windowed segment to obtain the frequency coefficients
Collect the results into the time-frequency matrix $G_x[m,n]$

This reduces the per-frame cost from $O(N^2)$ to $O(N \log N)$ . For the inverse transform, efficient dual window computation and overlap-add methods are used.

Oversampling vs. critical sampling

The sampling density in the time-frequency plane is controlled by the product $a \cdot b$ , where $a$ is the time step and $b$ is the frequency step.

Critical sampling ( $a \cdot b = 1$ ): the minimum number of coefficients needed for perfect reconstruction. Maximally efficient in storage, but the dual window may be poorly localized and the system is sensitive to perturbations.
Oversampling ( $a \cdot b < 1$ ): more coefficients than strictly necessary, introducing redundancy. The redundancy makes the representation more robust, the dual window better localized, and coefficient modification (e.g., for denoising) more stable.

Most practical implementations use moderate oversampling (redundancy factors of 2 to 4).

Numerical stability

Stability depends on the frame bounds $A$ and $B$ of the Gabor system. The condition number $B/A$ controls how noise and rounding errors propagate through reconstruction. Poorly chosen combinations of window width and sampling parameters can make $B/A$ very large, leading to unstable inversion.

To maintain stability:

Use oversampling to improve the frame bound ratio
Normalize the window appropriately
Avoid sampling parameters near the critical density boundary where frame bounds degrade

Applications of Gabor transform

Time-frequency analysis

The primary use of the Gabor transform is visualizing and analyzing how frequency content evolves over time. The squared magnitude $|G_x(t,f)|^2$ is called the spectrogram (when derived from a Gabor/STFT). Speech signals, musical audio, EEG, and ECG all exhibit time-varying spectral content that the Gabor transform reveals clearly.

Feature extraction

Gabor coefficients serve as localized time-frequency features for classification and pattern recognition. Because each coefficient captures energy at a specific time and frequency, they naturally encode the kind of discriminative structure needed for tasks like:

Speech and speaker recognition
Image texture classification (using 2D Gabor filters)
Mechanical fault diagnosis from vibration signals

The localization properties of the Gaussian window make these features more robust than global Fourier features for non-stationary data.

Denoising and compression

Both tasks operate by modifying Gabor coefficients before inverting back to the time domain.

Denoising: apply thresholding (hard or soft) to suppress coefficients dominated by noise while preserving signal components. Oversampled representations work better here because the redundancy allows more aggressive thresholding without introducing artifacts.
Compression: discard or quantize small coefficients to reduce storage. The trade-off is between reconstruction fidelity and compression ratio.

Variants and extensions

Gabor frames

Gabor frame theory provides the mathematical foundation for the discrete Gabor transform. A collection of Gabor atoms $\{g_{m,n}\}$ forms a frame if there exist constants $0 < A \leq B < \infty$ such that:

$A \|x\|^2 \leq \sum_{m,n} |G_x[m,n]|^2 \leq B \|x\|^2$

for all signals $x$ . When this holds, stable reconstruction is guaranteed via the dual frame. Frame theory also extends beyond Gaussian windows, allowing other window shapes as long as the frame condition is satisfied.

Multiwindow Gabor transform

Instead of a single Gaussian, the multiwindow variant uses several window functions simultaneously. Different windows can target different signal characteristics: a narrow window captures transients, while a wider window resolves closely spaced frequency components. The combined representation is richer, though at the cost of increased redundancy and computation.

Adaptive Gabor transform

The adaptive Gabor transform adjusts the window parameters (width, shape) locally based on the signal's time-frequency structure. The goal is to overcome the fixed-resolution limitation of the standard Gabor transform. For example, regions with rapid transients get a narrower window, while tonal regions get a wider one. This is conceptually appealing but computationally more demanding and requires a reliable criterion for local adaptation.

Relationship to other transforms

Fourier transform

The Fourier transform can be viewed as a limiting case of the Gabor transform where the window spans the entire signal (effectively a rectangular window of infinite length). It provides perfect frequency resolution but zero time localization. The Gabor transform trades some frequency precision for the ability to track spectral changes over time.

Wavelet transform

The wavelet transform uses a multi-resolution approach: the analysis window scales with frequency. At high frequencies, the wavelet has short duration (good time resolution); at low frequencies, it has long duration (good frequency resolution). The Gabor transform, by contrast, uses the same window at every frequency, giving uniform resolution.

Neither is universally better. Wavelets suit signals with broadband transients and slowly varying baselines. The Gabor transform suits signals where uniform resolution is acceptable or preferred, and where phase coherence across frequency is important.

Wigner-Ville distribution

The Wigner-Ville distribution (WVD) is a quadratic (bilinear) time-frequency representation defined as:

$W_x(t,f) = \int_{-\infty}^{\infty} x(t + \tau/2) \, x^*(t - \tau/2) \, e^{-j2\pi f\tau} \, d\tau$

It achieves excellent time-frequency concentration for single-component signals but produces cross-terms (interference artifacts) for multi-component signals. Smoothing the WVD with a Gaussian kernel in both time and frequency yields the Gabor spectrogram, which suppresses cross-terms at the expense of some resolution. This connection highlights the Gabor transform as a practical compromise between resolution and interference suppression.