Spectral density gives you a way to analyze stochastic processes in the frequency domain rather than the time domain. Instead of asking "how is the process correlated across time lags?" (which is what the autocorrelation function does), you ask "how is the process's power distributed across frequencies?"

The core idea: any wide-sense stationary process can be represented as a sum of sinusoidal components with random amplitudes and phases. This spectral decomposition lets you see which frequencies carry the most energy in the process, which turns out to be extremely useful for filtering, signal detection, and system identification.

Spectral density functions

Definition of spectral density

The spectral density function $S(f)$ describes how the variance (or power) of a stationary process is spread across frequencies. Its formal definition is the Fourier transform of the autocorrelation function $R(\tau)$ :

$S(f) = \int_{-\infty}^{\infty} R(\tau) \, e^{-j2\pi f\tau} \, d\tau$

Think of it this way: $R(\tau)$ tells you how values at different time lags relate to each other, while $S(f)$ tells you the same information repackaged as frequency contributions. A sharp peak in $S(f)$ at some frequency $f_0$ means the process has a strong oscillatory component near that frequency.

Properties of spectral density functions

Three properties you need to know:

Non-negativity: $S(f) \geq 0$ for all $f$ . Power can't be negative at any frequency.
Symmetry: $S(f) = S(-f)$ . This follows from $R(\tau)$ being real-valued and even. In practice, you often only plot $S(f)$ for $f \geq 0$ .
Total variance: The area under the spectral density curve equals the process variance:

$\int_{-\infty}^{\infty} S(f) \, df = R(0)$

Since $R(0) = \text{Var}(X(t))$ for a zero-mean process, this means integrating the spectral density recovers the total power.

Relationship between spectral density and autocorrelation

$S(f)$ and $R(\tau)$ form a Fourier transform pair, often called the Wiener-Khinchin relation. You can go in either direction:

Forward: $S(f) = \int_{-\infty}^{\infty} R(\tau) \, e^{-j2\pi f\tau} \, d\tau$
Inverse: $R(\tau) = \int_{-\infty}^{\infty} S(f) \, e^{j2\pi f\tau} \, df$

This duality is what makes spectral analysis so powerful. Any property you can express through $R(\tau)$ has an equivalent frequency-domain statement through $S(f)$ , and vice versa. You pick whichever domain makes your problem easier to solve.

Calculating spectral densities

Spectral densities of common processes

White noise has a constant spectral density: $S(f) = \sigma^2$ for all $f$ . Every frequency contributes equally, which corresponds to the autocorrelation being a delta function at $\tau = 0$ (no correlation between any distinct time points).

Autoregressive (AR) processes have spectral densities with peaks at frequencies where the process resonates. For example, an AR(2) process with complex-conjugate poles will show a clear spectral peak near the frequency corresponding to those poles. The sharper the peak, the more narrowband the process.

Moving average (MA) processes have spectral densities with nulls (zeros) at specific frequencies. An MA process acts like a finite-impulse-response filter applied to white noise, so its spectral shape is determined by the zeros of its characteristic polynomial.

Spectral densities from linear filters

When a stationary process $X(t)$ with spectral density $S_x(f)$ passes through a linear time-invariant (LTI) system with frequency response $H(f)$ , the output spectral density is:

$S_y(f) = |H(f)|^2 \, S_x(f)$

This is one of the most useful results in spectral analysis. It says the system shapes the input spectrum by multiplying it with the squared magnitude of the transfer function. For instance, if you drive an LTI system with white noise (flat $S_x(f)$ ), the output spectrum directly reveals $|H(f)|^2$ .

Spectral densities of modulated processes

Modulation shifts the spectral content of a process to a different frequency band. If you multiply a process $X(t)$ by a carrier $\cos(2\pi f_c t)$ , the resulting spectrum is a shifted (and scaled) version of $S_x(f)$ , centered around $\pm f_c$ .

This principle underlies amplitude modulation (AM) and is relevant whenever you need to analyze or transmit signals in specific frequency bands. Frequency modulation (FM) produces a more complex spectral transformation, but the basic idea of spectral shifting still applies.

Periodogram analysis

Definition of periodogram

In practice, you don't have access to the true $S(f)$ . Instead, you estimate it from a finite data record. The periodogram is the simplest such estimate:

$I(f) = \frac{1}{N} |X(f)|^2$

where $X(f)$ is the Discrete Fourier Transform (DFT) of $N$ observed samples. The periodogram tells you how much power appears at each frequency in your particular data set.

Periodogram vs spectral density

The periodogram is a natural estimator of $S(f)$ , but it has a serious limitation: its variance does not decrease as you collect more data. Even with a very long record, the periodogram at each frequency remains noisy. Formally, the periodogram is an asymptotically unbiased but inconsistent estimator of $S(f)$ .

For short data records, the fluctuations are especially severe, and individual periodogram values can be far from the true spectral density.

Smoothing periodograms for spectral estimation

To get a reliable spectral estimate, you need to smooth the periodogram. The two most common approaches:

Bartlett's method: Divide the data into $K$ non-overlapping segments, compute the periodogram of each segment, and average them. Variance drops by a factor of roughly $K$ , but frequency resolution decreases because each segment is shorter.
Welch's method: Same idea, but segments are allowed to overlap (typically 50%), and each segment is multiplied by a window function (e.g., Hanning) before computing the DFT. The windowing reduces spectral leakage, and the overlap partially recovers the data efficiency lost to windowing.

Both methods illustrate a fundamental tradeoff: smoothing reduces variance at the cost of frequency resolution. You can't have both with finite data.

Spectral analysis applications

Signal detection in noise

If a deterministic signal is buried in noise, its energy will be concentrated at specific frequencies, while the noise spreads across the entire spectrum. By examining $S(f)$ , you can often spot the signal as a peak rising above the noise floor.

Practical techniques include matched filtering (which maximizes the signal-to-noise ratio for a known signal shape) and energy detection (which looks for excess power in a target frequency band).

System identification using spectral methods

You can estimate an unknown system's transfer function by comparing the input and output spectral densities. If $S_x(f)$ and $S_y(f)$ are known:

$|H(f)|^2 = \frac{S_y(f)}{S_x(f)}$

Coherence analysis goes further by measuring how well the input-output relationship fits a linear model at each frequency. A coherence value near 1 at frequency $f$ means the system behaves linearly at that frequency; a value near 0 suggests nonlinearity or noise dominance.

Optimal linear filtering in frequency domain

The Wiener filter is the optimal linear filter that minimizes mean squared error between the filter output and a desired signal. Its design is naturally expressed in the frequency domain:

$H_{\text{opt}}(f) = \frac{S_{xd}(f)}{S_x(f)}$

where $S_{xd}(f)$ is the cross-spectral density between the observed signal and the desired signal, and $S_x(f)$ is the spectral density of the observed signal. Working in the frequency domain makes the optimization tractable and gives direct insight into which frequencies the filter emphasizes or suppresses.

Definition of spectral density, Autocorrelation structure at rest predicts value correlates of single neurons during reward ...

Sampling and aliasing considerations

Nyquist frequency and aliasing

The Nyquist frequency $f_N = f_s / 2$ (where $f_s$ is the sampling rate) is the highest frequency you can faithfully represent in a sampled signal. Any frequency component above $f_N$ in the original continuous-time signal gets "folded back" into the range $[0, f_N]$ and becomes indistinguishable from a lower-frequency component. This is aliasing.

For example, if you sample at 100 Hz ( $f_N = 50$ Hz), a 70 Hz component in the original signal will appear as a 30 Hz component in the sampled data.

Effects of sampling on spectral density

Sampling a continuous-time signal causes its spectral density to replicate periodically, with copies centered at every multiple of $f_s$ . If the original signal is bandlimited to $f_N$ , these copies don't overlap and you can perfectly reconstruct the spectrum. If the signal has energy above $f_N$ , the copies overlap and the spectral density in the sampled signal is a distorted version of the original.

Anti-aliasing filters and downsampling

To prevent aliasing, you apply a low-pass anti-aliasing filter before sampling. This filter removes (or at least attenuates) frequency components above $f_N$ , ensuring the sampled signal's spectrum is clean.

Downsampling (decimation) reduces the sampling rate of an already-discrete signal. Before downsampling by a factor of $M$ , you must bandlimit the signal to the new Nyquist frequency $f_N / M$ . Otherwise, the same aliasing problem occurs during rate reduction.

Multivariate spectral analysis

Cross-spectral density functions

When you have two (or more) jointly stationary processes, the cross-spectral density $S_{xy}(f)$ captures their frequency-domain relationship. It's defined as the Fourier transform of the cross-correlation function:

$S_{xy}(f) = \int_{-\infty}^{\infty} R_{xy}(\tau) \, e^{-j2\pi f\tau} \, d\tau$

Unlike the auto-spectral density, $S_{xy}(f)$ is generally complex-valued. Its magnitude tells you how strongly the two processes share power at frequency $f$ , and its phase tells you the time/phase lead-lag relationship at that frequency.

Coherence and partial coherence

Coherence quantifies the strength of the linear relationship between two processes at each frequency:

$C_{xy}(f) = \frac{|S_{xy}(f)|^2}{S_x(f) \, S_y(f)}$

Coherence ranges from 0 (no linear relationship at frequency $f$ ) to 1 (perfectly linear relationship). It's the frequency-domain analog of the squared correlation coefficient.

Partial coherence extends this to multivariate settings. It measures the linear dependence between two processes at a given frequency after removing the influence of other processes, similar to how partial correlation works in statistics.

Principal component analysis in frequency domain

PCA can be applied to the cross-spectral density matrix at each frequency to identify dominant modes of variability in multivariate processes.

At each frequency $f$ , you eigendecompose the cross-spectral density matrix. The eigenvectors identify the spatial/channel patterns that contribute most to the total power at that frequency, and the eigenvalues quantify how much variance each pattern explains.

This frequency-domain PCA is useful for extracting common oscillatory patterns from multivariate time series, such as identifying shared rhythmic activity across multiple sensor channels.