Fiveable

📡Advanced Signal Processing Unit 1 Review

QR code for Advanced Signal Processing practice questions

1.4 Fast Fourier transform (FFT)

1.4 Fast Fourier transform (FFT)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📡Advanced Signal Processing
Unit & Topic Study Guides

Overview of FFT Algorithm

The Fast Fourier Transform (FFT) is an efficient algorithm for computing the Discrete Fourier Transform (DFT) of a sequence. Rather than evaluating the DFT sum directly, the FFT exploits symmetry and periodicity in the DFT's complex exponentials to reduce computational complexity from O(N2)O(N^2) to O(NlogN)O(N \log N). That reduction is what makes real-time frequency analysis practical for large signal lengths in applications like spectrum analysis, filtering, and audio processing.

Comparison of DFT vs FFT

The DFT computes the frequency-domain representation of a discrete-time signal by directly evaluating:

X[k]=n=0N1x[n]ej2πkn/N,k=0,1,,N1X[k] = \sum_{n=0}^{N-1} x[n] \, e^{-j2\pi kn/N}, \quad k = 0, 1, \dots, N-1

Each of the NN output bins requires a sum over NN terms, giving O(N2)O(N^2) complex multiplications total. The FFT produces the exact same result but restructures the computation. Algorithms like the Cooley-Tukey algorithm recursively divide the DFT into smaller sub-problems, combining their results with far fewer operations. The output is mathematically identical; only the operation count changes.

Computational Complexity of FFT

Big-O Notation for FFT

The FFT's complexity is O(NlogN)O(N \log N), where NN is the length of the input sequence. The logN\log N factor comes from the recursive halving: each level of recursion cuts the problem size in half, and there are log2N\log_2 N levels. At each level, the total work across all sub-problems is proportional to NN.

Comparison to DFT Complexity

To put the speedup in concrete terms: for N=1024N = 1024, the DFT requires on the order of 102421061024^2 \approx 10^6 operations, while the FFT requires roughly 1024×101041024 \times 10 \approx 10^4 operations. That's about a 100× speedup. At N=106N = 10^6, the ratio grows to roughly 50,000×. This is why the direct DFT is impractical for large signal lengths and why the FFT was such a pivotal development.

Radix-2 FFT Algorithm

Decimation in Time (DIT) Approach

The radix-2 DIT FFT works by splitting the NN-point DFT into two N/2N/2-point DFTs, one over the even-indexed samples and one over the odd-indexed samples. The key identity is:

X[k]=m=0N/21x[2m]ej2πkm/(N/2)G[k]  +  WNkm=0N/21x[2m+1]ej2πkm/(N/2)H[k]X[k] = \underbrace{\sum_{m=0}^{N/2-1} x[2m]\, e^{-j2\pi km/(N/2)}}_{G[k]} \;+\; W_N^k \underbrace{\sum_{m=0}^{N/2-1} x[2m+1]\, e^{-j2\pi km/(N/2)}}_{H[k]}

where WNk=ej2πk/NW_N^k = e^{-j2\pi k/N} is the twiddle factor. G[k]G[k] is the DFT of the even-indexed samples and H[k]H[k] is the DFT of the odd-indexed samples. This splitting is applied recursively until you reach 2-point DFTs, which are trivial additions and subtractions.

Butterfly Diagram for DIT FFT

Each stage of the radix-2 DIT FFT is built from butterfly operations. A single butterfly takes two inputs aa and bb, multiplies bb by a twiddle factor WNkW_N^k, and produces two outputs:

  • a+WNkba + W_N^k \cdot b
  • aWNkba - W_N^k \cdot b

The butterfly diagram is a flow graph that shows how all NN data points are combined across log2N\log_2 N stages. Reading the diagram left to right, you can trace every multiply and add/subtract in the entire FFT. Being comfortable reading butterfly diagrams is essential for understanding FFT hardware implementations and debugging signal flow.

Bit Reversal in DIT FFT

Before the butterfly stages begin, the DIT algorithm requires the input to be reordered in bit-reversed order. To find the bit-reversed index, write the original index in binary (using log2N\log_2 N bits) and reverse the bit string. For example, with N=8N = 8:

  • Index 1 (binary 001) maps to index 4 (binary 100)
  • Index 3 (binary 011) maps to index 6 (binary 110)

This reordering ensures that when the butterfly stages combine sub-DFT results, the correct even/odd pairs line up at each level. Without it, the outputs would be scrambled.

Radix-2 FFT Implementation

Recursive FFT Implementation

The most direct implementation follows the divide-and-conquer structure:

  1. If N=1N = 1, return the single input sample (base case).
  2. Split the input into even-indexed and odd-indexed subsequences.
  3. Recursively compute the FFT of each half.
  4. Combine the two half-length DFTs using butterfly operations with the appropriate twiddle factors.

This recursive approach maps cleanly onto the math but carries overhead from function calls and temporary array allocations, which matters for performance-critical or embedded applications.

In-Place Computation of FFT

An in-place FFT overwrites the input array with intermediate and final results, requiring no additional memory beyond the original buffer. The iterative (non-recursive) form of the Cooley-Tukey algorithm naturally supports this:

  1. Apply bit-reversal permutation to the input array.
  2. Loop through log2N\log_2 N stages, performing butterfly operations that read from and write back to the same array locations.

In-place computation is the standard approach in practice, especially on memory-constrained systems or when processing large transforms.

Overflow and Scaling in FFT

In fixed-point implementations, intermediate values can grow in magnitude at each butterfly stage, risking overflow. Common strategies to handle this:

  • Block floating point: track the maximum magnitude and shift the entire array when needed.
  • Stage scaling: divide by 2 (right-shift by 1) after each of the log2N\log_2 N stages, which distributes the 1/N1/N normalization across the transform.
  • Input scaling: normalize the input before the FFT so that intermediate values stay within range.

In floating-point implementations overflow is rarely an issue, but dynamic range and precision still matter for signals with widely varying component amplitudes.

FFT for Real-Valued Signals

Real FFT Algorithm

Most signals you encounter in practice (audio, sensor data, etc.) are real-valued. The DFT of a real sequence exhibits conjugate symmetry: X[k]=X[Nk]X[k] = X^*[N-k]. A real FFT exploits this by packing two real N/2N/2-point sequences into one complex N/2N/2-point FFT, then unscrambling the result. This roughly halves both computation and memory compared to treating the input as complex.

Redundancy in Real FFT Output

Because of conjugate symmetry, only N/2+1N/2 + 1 of the NN complex output bins are unique. Specifically, X[0]X[0] and X[N/2]X[N/2] are purely real, and for 1kN/211 \le k \le N/2 - 1, you have X[Nk]=X[k]X[N-k] = X^*[k]. Most FFT libraries return only these N/2+1N/2 + 1 coefficients for real inputs, saving both storage and downstream computation.

Big O notation for FFT, Big O Notation — The Science of Machine Learning & AI

Inverse FFT (IFFT)

IFFT Definition and Properties

The Inverse FFT computes the inverse DFT efficiently, converting a frequency-domain representation back to the time domain:

x[n]=1Nk=0N1X[k]ej2πkn/Nx[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k] \, e^{j2\pi kn/N}

The structure is nearly identical to the forward FFT. The only differences are the sign of the exponent (positive instead of negative) and the 1/N1/N scaling factor.

Relationship Between FFT and IFFT

The FFT and IFFT are inverses: applying one after the other recovers the original data (within floating-point precision). A useful implementation trick is that you can compute the IFFT using a forward FFT routine:

  1. Conjugate the input spectrum X[k]X[k].
  2. Compute the forward FFT of the conjugated sequence.
  3. Conjugate the result.
  4. Divide by NN.

This means a single FFT routine can serve both directions, which simplifies library design.

FFT Variants and Generalizations

Radix-4 and Split-Radix FFT

The radix-4 FFT splits the input into four subsequences instead of two, performing 4-point butterflies at each stage. When NN is a power of 4, radix-4 requires about 25% fewer multiplications than radix-2 because it can combine twiddle factor multiplications more efficiently.

The split-radix FFT mixes radix-2 and radix-4 decompositions to achieve the lowest known operation count for power-of-2 lengths. It uses radix-2 for odd-indexed frequencies and radix-4 for even-indexed frequencies, giving it an edge over pure radix-2 or radix-4 for general power-of-2 sizes.

FFT for 2D Signals

The 2D DFT of an image or other 2D data can be computed by applying 1D FFTs separably:

  1. Compute the 1D FFT of every row.
  2. Compute the 1D FFT of every column of the row-transformed result.

The order (rows-first or columns-first) doesn't matter mathematically. This separability keeps the complexity at O(N2logN)O(N^2 \log N) for an N×NN \times N array, rather than the O(N4)O(N^4) cost of a direct 2D DFT. Applications include image compression (JPEG uses a related transform), 2D filtering, and feature extraction.

Goertzel Algorithm for Single Frequency

The Goertzel algorithm computes a single DFT bin X[k]X[k] without computing the entire spectrum. It's implemented as a second-order IIR filter that processes the input sample-by-sample and produces the desired DFT coefficient at the end. Its complexity is O(N)O(N) per bin, so it's more efficient than a full FFT when you need only a small number of frequency bins (roughly fewer than log2N\log_2 N bins). A classic use case is DTMF tone detection, where you only need to check a handful of specific frequencies.

Applications of FFT in DSP

Spectrum Analysis Using FFT

Spectrum analysis is the most straightforward FFT application: take a block of NN time-domain samples, apply a window function, compute the FFT, and examine the magnitude and phase spectra. The magnitude spectrum X[k]|X[k]| reveals which frequencies are present and their relative strengths. The frequency resolution is Δf=fs/N\Delta f = f_s / N, where fsf_s is the sampling rate, so longer transforms give finer resolution.

Efficient FIR Filtering with FFT

Convolution in the time domain corresponds to multiplication in the frequency domain. For long FIR filters, direct convolution costs O(NM)O(NM) where MM is the filter length, but FFT-based filtering (often called fast convolution) can be much cheaper:

  1. Zero-pad both the input block and the filter to length N+M1N + M - 1 (or the next power of 2).

  2. Compute the FFT of both.

  3. Multiply the two spectra element-wise.

  4. Compute the IFFT to get the filtered output.

For long filters, the O(NlogN)O(N \log N) cost of the FFTs beats the O(NM)O(NM) cost of direct convolution. Overlap-add and overlap-save are two standard methods for applying this technique to continuous (streaming) data.

FFT in Audio and Speech Processing

The FFT is central to audio and speech processing pipelines. Frequency equalization adjusts the magnitude of specific frequency bands via spectral multiplication. Noise reduction algorithms estimate the noise spectrum and subtract or attenuate it in the frequency domain. Audio codecs like MP3 and AAC use transform-domain representations (closely related to the FFT) for perceptual compression. In speech processing, the Short-Time Fourier Transform (STFT), which applies the FFT to overlapping windowed frames, is the basis for spectrograms, speech recognition front-ends, and speech synthesis systems.

Limitations and Pitfalls of FFT

Spectral Leakage and Windowing

Spectral leakage occurs when the signal is not exactly periodic within the FFT window. The abrupt truncation at the window edges introduces artificial discontinuities, causing energy from a single frequency to spread across neighboring bins. Applying a window function (Hann, Hamming, Blackman, etc.) before the FFT tapers the signal smoothly to zero at the edges, reducing leakage. The tradeoff is that windowing widens the main lobe of each spectral peak, slightly reducing frequency resolution. Choosing the right window depends on whether you prioritize frequency resolution (narrow main lobe) or sidelobe suppression (less leakage).

Aliasing and Sampling Rate Considerations

Aliasing happens when the signal contains frequency components above fs/2f_s / 2 (the Nyquist frequency). These components fold back into the spectrum and appear as spurious lower-frequency content that cannot be distinguished from real signal components. The FFT itself doesn't cause aliasing; it's a consequence of insufficient sampling rate. To prevent it:

  • Apply an analog anti-aliasing filter before the ADC.
  • Ensure the sampling rate satisfies fs2fmaxf_s \ge 2 f_{\max}.

Once aliasing has occurred in the sampled data, no amount of post-processing can undo it.

FFT Accuracy and Numerical Stability

Roundoff errors accumulate across the log2N\log_2 N stages of the FFT. For an NN-point FFT using double-precision floating point, the RMS error typically grows as O(logN)O(\log N), which is quite manageable for most applications. Problems arise mainly with:

  • Fixed-point arithmetic, where limited bit-width causes quantization noise to accumulate.
  • Signals with large dynamic range, where small components can be lost in the noise floor of large components.
  • Very large FFT sizes, where cumulative roundoff becomes noticeable.

Using floating-point arithmetic, proper input scaling, and appropriate data types keeps these issues under control in the vast majority of practical systems.