Large-scale structure in the universe isn't random. Galaxies cluster together, forming patterns we can measure. These patterns encode information about the universe's composition, its initial conditions, and how structure has grown over cosmic time.
To quantify these patterns, cosmologists rely on statistical tools, primarily the two-point correlation function and the matter power spectrum. These are complementary descriptions of the same underlying clustering signal, one in real space and one in Fourier space.
Statistical Measures of Large-Scale Structure
Two-point correlation function for galaxies
The two-point correlation function measures the excess probability of finding a pair of galaxies separated by distance , compared to what you'd expect if galaxies were scattered randomly. It's the most direct way to quantify clustering.
A common estimator is:
- is the number of galaxy pairs with separation in the observed data.
- is the expected number of pairs at that separation in a random catalog with the same survey geometry.
When , galaxies are more clustered than random at that scale. When , galaxies avoid each other at that scale (void regions). At very large separations, because the universe approaches homogeneity.
In practice, more sophisticated estimators (like the Landy-Szalay estimator, which also uses cross-counts between data and random catalogs) are preferred because they reduce bias from edge effects and survey geometry.
Estimating the correlation function involves:
- Constructing a random catalog that matches the survey's angular footprint and radial selection function.
- Counting galaxy-galaxy pairs , random-random pairs , and data-random pairs at each separation bin.
- Normalizing the pair counts and applying the chosen estimator.
- Repeating across a large enough galaxy sample to beat down shot noise. Surveys like SDSS, DES, and the upcoming Euclid and LSST/Rubin Observatory provide the millions of galaxies needed for reliable statistics.

Power spectrum and correlation function
The power spectrum is the Fourier-space counterpart of the correlation function. While tells you about clustering at a given physical separation, tells you the amplitude of density fluctuations at a given spatial frequency (wavenumber , with units of inverse length).
The two are related by a Fourier transform:
They contain the same information, but each has practical advantages. The power spectrum is often preferred for theoretical work because different Fourier modes evolve independently in the linear regime, making predictions cleaner. The correlation function is sometimes easier to estimate directly from survey data and is more intuitive for identifying features at specific physical scales (like the BAO peak at Mpc).
The shape of depends on the underlying cosmological model, the matter content of the universe, and the physics of structure formation. Its amplitude reflects the overall level of clustering.

Shape and amplitude of the galaxy power spectrum
The power spectrum's shape carries distinct physical information at different scales:
- Large scales (small ): The spectrum turns over near the scale corresponding to the particle horizon at matter-radiation equality. On scales larger than this, the primordial spectrum is roughly preserved. The turnover scale depends on , so measuring it constrains the total matter density.
- Intermediate scales: The slope encodes the relative amounts of baryonic and dark matter. Baryonic acoustic oscillations (BAO) imprint a series of wiggles on , corresponding to the sound horizon at recombination ( Mpc comoving). These wiggles act as a standard ruler for measuring cosmic distances.
- Small scales (large ): Non-linear gravitational collapse, galaxy mergers, and baryonic feedback processes (AGN heating, supernova-driven outflows) reshape the spectrum. Predictions here require N-body simulations or effective models rather than simple linear theory.
The amplitude of the power spectrum is typically parameterized by , the root-mean-square density fluctuation in spheres of radius Mpc. This depends on both the primordial fluctuation amplitude and the matter density .
A critical complication is galaxy bias: galaxies don't perfectly trace the underlying matter distribution. More massive halos (and the luminous galaxies they host) are more strongly clustered than the dark matter itself. The bias factor relates the galaxy power spectrum to the matter power spectrum: . This relationship is scale-independent only on large scales; on smaller scales, bias becomes more complex.
Statistical measures of galaxy clustering
Extracting cosmological information from clustering measurements requires large galaxy surveys covering significant cosmic volume. Current and upcoming surveys span a range of approaches:
- SDSS mapped over a million galaxy redshifts, providing the first high-precision measurements of and the BAO feature.
- DES uses photometric redshifts over a wide area, combining clustering with weak lensing.
- Euclid and LSST/Rubin will map billions of galaxies out to higher redshifts, dramatically improving constraints.
These clustering measurements constrain cosmological parameters because the shape and amplitude of depend sensitively on , , , the spectral index , , and the dark energy equation of state . Observed clustering is compared to theoretical predictions using Bayesian inference, typically implemented with Markov Chain Monte Carlo (MCMC) sampling to explore the high-dimensional parameter space.
Measuring the correlation function and power spectrum at different redshifts reveals how clustering evolves over cosmic time. The growth rate of structure is particularly powerful as a cosmological probe: in general relativity, it's determined by , but modified gravity theories predict different growth rates. Comparing the observed evolution of clustering with these predictions provides one of the cleanest tests of gravity on cosmological scales and helps distinguish dark energy models from modifications to general relativity.