Random signals provide the framework for analyzing and modeling complex, unpredictable phenomena in signal processing. Understanding their properties is what makes it possible to build robust techniques for communications, radar, biomedical engineering, and many other domains.

Stochastic Processes

A stochastic process is a mathematical model describing how random variables evolve over time or space. Each process is characterized by its probability distributions, which specify the likelihood of observing particular values or sequences of values.

Common examples:

Gaussian processes (e.g., white noise) where the joint distribution at any set of time points is Gaussian
Poisson processes (e.g., radioactive decay) that model the occurrence of discrete events in continuous time
Markov processes (e.g., weather state transitions) where the future state depends only on the present state, not the full history

Stationary vs. Non-Stationary Processes

A stationary process has statistical properties that don't change over time. Its mean, variance, and autocorrelation function all remain constant regardless of when you observe the signal. This is a powerful property because it means a single stretch of data can tell you about the process in general.

A non-stationary process has time-varying statistics. The mean might drift, the variance might grow, or the frequency content might shift. Many real-world signals fall into this category: speech, EEG recordings, and financial time series all exhibit non-stationarity. Analyzing them requires specialized techniques (short-time methods, wavelets, etc.) that can track how the spectrum changes over time.

Ergodicity in Random Signals

Ergodicity bridges the gap between what you can observe and what you want to know. For an ergodic process, the time average computed from a single, sufficiently long realization equals the ensemble average you'd get by averaging across many independent realizations.

Why does this matter? In practice, you almost never have access to multiple independent realizations of a process. You typically have one recording. Ergodicity is the assumption that justifies estimating statistical properties (mean, autocorrelation, PSD) from that single observation. Most spectral estimation algorithms rely on this assumption, so it's worth checking whether it's reasonable for your data.

Power Spectral Density (PSD)

The power spectral density describes how the power of a random signal is distributed across frequency. It's the central quantity in spectral analysis of stationary processes and tells you which frequency bands carry the most energy.

Definition of PSD

The PSD of a random signal $x(t)$ is defined as the Fourier transform of its autocorrelation function $R_x(\tau)$ . This relationship is known as the Wiener-Khinchin theorem:

$S_x(f) = \int_{-\infty}^{\infty} R_x(\tau) e^{-j2\pi f\tau} d\tau$

An equivalent definition uses the Fourier transform of the signal directly:

$S_x(f) = \lim_{T \to \infty} \frac{1}{T} \mathbb{E}\left[\left|\int_{-T/2}^{T/2} x(t) e^{-j2\pi ft} dt\right|^2\right]$

The first form is more useful conceptually (it connects correlation in time to power in frequency), while the second form motivates practical estimators like the periodogram.

Properties of PSD

Non-negative and real-valued: $S_x(f) \geq 0$ for all $f$ . You can't have negative power at any frequency.
Total power: The area under the PSD curve equals the total signal power, i.e., $\int_{-\infty}^{\infty} S_x(f) df = R_x(0) = \mathbb{E}[|x(t)|^2]$ .
Even symmetry: For real-valued signals, $S_x(f) = S_x(-f)$ , which follows from the autocorrelation being real and even.
White vs. colored noise: A white noise process has a flat PSD (equal power at all frequencies), while colored noise has a frequency-dependent PSD (e.g., $1/f$ noise rolls off with frequency).

PSD Estimation Methods

Since you never have an infinitely long signal, the true PSD must be estimated from finite data. The two main families of estimators differ in their assumptions:

Non-parametric methods (periodogram, Welch's method, multitaper) work directly from the data's Fourier transform without assuming a signal model. They're general-purpose but face bias-variance tradeoffs.
Parametric methods (AR, MA, ARMA) assume the signal was generated by a specific model structure. They estimate model parameters first, then compute the PSD analytically. They can achieve better frequency resolution with short data records, but produce misleading results if the model assumption is wrong.

The right choice depends on your signal characteristics, data length, and what you need from the estimate.

Spectral Estimation Techniques

These non-parametric methods estimate the PSD directly from the observed data using the Fourier transform. Each method represents a different point on the bias-variance tradeoff.

Periodogram

The periodogram is the simplest PSD estimator. It computes the squared magnitude of the DFT of the observed signal, normalized by the number of samples:

$\hat{S}_x(f) = \frac{1}{N} \left|\sum_{n=0}^{N-1} x[n] e^{-j2\pi fn}\right|^2$

The periodogram is computationally cheap (just an FFT plus squaring), but it has two well-known problems:

High variance: The periodogram is not a consistent estimator. Its variance does not decrease as you collect more data; it stays roughly proportional to the true PSD squared. This makes individual periodogram estimates very noisy.
Spectral leakage: Finite observation length causes energy from one frequency to bleed into neighboring frequencies. Applying a window function before the FFT reduces leakage at the cost of slightly widening spectral peaks.

Welch's Method

Welch's method reduces the variance problem by averaging multiple periodograms computed from overlapping segments of the signal.

Steps:

Divide the signal of length $N$ into $K$ overlapping segments, each of length $L$ , with overlap $D$ samples between consecutive segments.
Apply a window function $w[n]$ to each segment.
Compute the modified periodogram for each windowed segment.
Average the $K$ periodograms to get the final PSD estimate:

$\hat{S}_x(f) = \frac{1}{KLU} \sum_{k=1}^{K} \left|\sum_{n=0}^{L-1} w[n] x[n+kD] e^{-j2\pi fn}\right|^2$

where $U = \frac{1}{L}\sum_{n=0}^{L-1} |w[n]|^2$ is a normalization factor that accounts for the window's energy.

The tradeoff: averaging $K$ segments reduces variance by roughly a factor of $K$ , but using shorter segments (to get more of them) reduces frequency resolution. Typical overlap is 50%, which gives a good balance when using Hann or Hamming windows.

Multitaper Method

The multitaper method takes a different approach to variance reduction. Instead of segmenting the data, it applies $K$ different orthogonal window functions (tapers) to the entire signal and averages the resulting spectral estimates:

$\hat{S}_x(f) = \frac{1}{K} \sum_{k=1}^{K} \left|\sum_{n=0}^{N-1} w_k[n] x[n] e^{-j2\pi fn}\right|^2$

The tapers are typically the Slepian sequences (discrete prolate spheroidal sequences, or DPSS), which are optimal in the sense that they maximize energy concentration within a specified bandwidth.

The multitaper method offers several advantages over Welch's method: it uses the full data record (no segmentation), provides near-optimal bias-variance tradeoff, and the bandwidth parameter gives direct control over the resolution-variance tradeoff. It's particularly effective for short data records or signals with complex spectral structure.

Parametric Spectral Estimation

Parametric methods assume the signal was generated by a linear system driven by white noise. You estimate the system's parameters from the data, then compute the PSD analytically from those parameters. This can yield much sharper spectral estimates than non-parametric methods, especially with limited data, but the results are only trustworthy if the model fits the actual signal.

Autoregressive (AR) Models

An AR model of order $p$ expresses each sample as a weighted sum of the $p$ previous samples plus white noise:

$x[n] = \sum_{k=1}^{p} a_k x[n-k] + w[n]$

where $a_k$ are the AR coefficients and $w[n]$ is white noise with variance $\sigma^2$ .

The corresponding PSD is:

$S_x(f) = \frac{\sigma^2}{\left|1 - \sum_{k=1}^{p} a_k e^{-j2\pi fk}\right|^2}$

Notice the denominator: the PSD has peaks wherever the polynomial $1 - \sum a_k z^{-k}$ is small, meaning the model's poles are close to the unit circle. This makes AR models excellent at representing signals with sharp spectral peaks (narrowband components), such as speech formants or seismic resonances.

Common methods for estimating AR parameters include the Yule-Walker equations, Burg's method (which guarantees a stable model), and covariance/modified covariance methods. Model order $p$ is typically selected using criteria like AIC or BIC.

Stochastic processes, Statistics/Distributions/Poisson - Wikibooks, open books for an open world

Moving Average (MA) Models

An MA model of order $q$ expresses each sample as a weighted sum of current and past white noise values:

$x[n] = \sum_{k=0}^{q} b_k w[n-k]$

The PSD of an MA process is:

$S_x(f) = \sigma^2 \left|\sum_{k=0}^{q} b_k e^{-j2\pi fk}\right|^2$

The PSD here is determined by the numerator polynomial, which means MA models are good at representing spectral nulls (deep valleys in the spectrum) and broad spectral shapes. They're the natural dual of AR models: an MA model can approximate the inverse spectrum of an AR process.

MA parameter estimation is more involved than AR estimation because the relationship between the autocorrelation and the MA coefficients is nonlinear. Iterative methods are typically required.

Autoregressive Moving Average (ARMA) Models

ARMA models combine both AR and MA components:

$x[n] = \sum_{k=1}^{p} a_k x[n-k] + \sum_{k=0}^{q} b_k w[n-k]$

The PSD is:

$S_x(f) = \sigma^2 \frac{\left|\sum_{k=0}^{q} b_k e^{-j2\pi fk}\right|^2}{\left|1 - \sum_{k=1}^{p} a_k e^{-j2\pi fk}\right|^2}$

ARMA models can capture both spectral peaks (via the AR poles) and spectral nulls (via the MA zeros) simultaneously. This makes them more parsimonious than pure AR or MA models: you can often represent a complex spectrum with lower total order $p + q$ than you'd need with either model alone.

The tradeoff is that ARMA parameter estimation is significantly harder. Joint estimation of AR and MA parameters is a nonlinear problem, and algorithms like the modified Yule-Walker method or iterative maximum likelihood are needed.

Spectral Analysis Applications

The PSD and its estimates are not just theoretical constructs. They're the basis for solving real engineering problems across many domains.

Signal Detection in Noise

Spectral analysis enables detection of signals buried in noise by exploiting differences in their spectral signatures. A sinusoidal signal produces a sharp peak in the PSD, while broadband noise has a relatively flat spectrum. This spectral contrast is what makes detection possible.

Key detection approaches that rely on spectral analysis:

Matched filter: Maximizes output SNR by weighting each frequency according to the signal-to-noise ratio at that frequency. Requires knowledge of the signal's spectrum.
Energy detector: Measures total energy in a frequency band and compares it to a threshold. Useful when the signal's exact shape is unknown.

Applications span radar target detection, primary user detection in cognitive radio, and seismic event identification.

System Identification

You can characterize an unknown linear system by analyzing the spectral relationship between its input and output. The system's frequency response $H(f)$ relates the input and output PSDs:

$S_y(f) = |H(f)|^2 S_x(f)$

By estimating $S_x(f)$ and $S_y(f)$ (or the cross-spectral density $S_{xy}(f)$ ), you can recover $H(f)$ and infer system properties like stability, resonant frequencies, and bandwidth. Parametric methods (AR, ARMA) are especially useful here because they directly yield a rational transfer function model.

Applications include control system design, acoustic room characterization, and structural health monitoring.

Time Series Forecasting

Spectral decomposition of time series data reveals periodic components, trends, and noise structure that can improve forecasts. By identifying dominant frequencies in the PSD, you can separate predictable oscillatory behavior from unpredictable noise.

Techniques used in this context include Fourier-based decomposition, wavelet analysis for non-stationary series, and singular spectrum analysis (SSA) which combines ideas from time-domain embedding and spectral decomposition. Applications range from financial market analysis to weather forecasting and demand planning.

Advanced Topics in Spectral Analysis

Standard PSD analysis assumes stationarity, Gaussianity, and linearity. When these assumptions break down, more sophisticated spectral tools are needed.

Higher-Order Spectra

Higher-order spectra (polyspectra) generalize the PSD by capturing statistical dependencies beyond second-order (beyond correlation). The PSD is the second-order spectrum. Going higher:

Bispectrum (third-order): Measures phase coupling between frequency components. Defined as the Fourier transform of the third-order cumulant. It's complex-valued, so it retains phase information that the PSD discards.
Trispectrum (fourth-order): Captures quadratic phase coupling and higher-order nonlinear interactions.

Why bother? For Gaussian processes, all higher-order spectra are identically zero. So a non-zero bispectrum is direct evidence of non-Gaussianity or nonlinear coupling in the signal. This makes higher-order spectra valuable for machine fault diagnosis (detecting nonlinear vibration modes), speech analysis, and characterizing nonlinear wave interactions.

Cyclostationary Signal Analysis

A cyclostationary signal has statistical properties that vary periodically with time rather than remaining constant (stationary) or varying arbitrarily (non-stationary). This periodic variation creates a hidden structure that standard PSD analysis misses.

Examples include digitally modulated communication signals (whose statistics repeat at the symbol rate), rotating machinery vibrations (repeating at the shaft frequency), and any signal with periodic sampling or multiplexing.

The key analysis tools are:

Spectral correlation density (also called the cyclic spectrum): A two-dimensional function of frequency and cycle frequency that reveals periodicities in the spectral structure.
Cyclic modulation spectrum: Characterizes how spectral components are modulated at specific cycle frequencies.

These tools are powerful for signal detection and classification in low-SNR environments, blind source separation, and modulation recognition, because the cyclic features of a signal of interest typically differ from those of noise and interference.

Wavelet-Based Spectral Analysis

Wavelet analysis provides a time-frequency representation that adapts its resolution to the signal content. Unlike the short-time Fourier transform (which uses a fixed window), wavelets use short windows at high frequencies and long windows at low frequencies. This multi-resolution property matches how many natural signals behave.

The wavelet power spectrum measures energy distribution across time-frequency scales and is computed from the continuous or discrete wavelet transform. Compared to Fourier-based methods:

Wavelets handle non-stationarity naturally, since they're localized in both time and frequency.
They're well-suited for detecting transients, edges, and singularities.
They sacrifice some frequency precision at high frequencies in exchange for better time localization.

Applications include EEG and ECG analysis, geophysical signal processing, and image compression (JPEG 2000 uses wavelets).

Practical Considerations

Real-world spectral analysis involves finite, sampled, noisy data. Several practical issues can significantly affect your results if you don't handle them properly.

Sampling and Aliasing Effects

Sampling converts a continuous-time signal to a discrete-time sequence by measuring values at uniform intervals $T_s$ , giving a sampling rate $f_s = 1/T_s$ .

The Nyquist-Shannon sampling theorem states that you must sample at a rate of at least $f_s \geq 2f_{\max}$ , where $f_{\max}$ is the highest frequency present in the signal. If this condition is violated, frequency components above $f_s/2$ fold back (alias) into the range $[0, f_s/2]$ , corrupting the spectrum irreversibly.

In practice, anti-aliasing filters (analog lowpass filters) are applied before the ADC to attenuate frequency content above $f_s/2$ . These filters can't be perfectly sharp, so the sampling rate is typically set somewhat higher than the strict Nyquist rate to provide a transition band.

Windowing and Spectral Leakage

When you compute the DFT of a finite-length signal, you're implicitly multiplying the infinite signal by a rectangular window. If the signal isn't perfectly periodic within that window, the abrupt truncation causes spectral leakage: energy from a single frequency component spreads across neighboring frequency bins.

Applying a tapered window function before the DFT reduces leakage by smoothing the signal's edges toward zero. Common choices and their tradeoffs:

Rectangular (no taper): Best frequency resolution, worst leakage
Hann: Good general-purpose choice, moderate resolution and leakage suppression
Hamming: Similar to Hann but with slightly different sidelobe behavior
Blackman: Excellent leakage suppression, but wider main lobe (reduced resolution)

The fundamental tradeoff is between main lobe width (frequency resolution) and sidelobe level (leakage suppression). Narrower main lobes resolve closely spaced frequencies better, but higher sidelobes let strong components mask weak nearby ones.

Computational Complexity

For large datasets or real-time systems, computational cost matters.

The FFT computes the DFT in $O(N \log N)$ operations, making non-parametric methods (periodogram, Welch's) very efficient.
Parametric methods add the cost of parameter estimation. Solving the Yule-Walker equations for an AR( $p$ ) model costs $O(p^2)$ using the Levinson-Durbin algorithm, but ARMA estimation involves iterative optimization with higher and less predictable cost.
Multitaper methods require $K$ FFTs (one per taper), so their cost scales as $O(KN \log N)$ .

For demanding applications, strategies like parallel/GPU computing, pruned FFTs, and sliding-window approaches can keep computation tractable. The choice of method should always balance the accuracy and resolution you need against the computational budget you have.