Large-Scale Structure Formation
Cosmic Web and Dark Matter Halos
The universe's matter isn't spread out evenly. Instead, it's organized into a vast cosmic web of interconnected filaments, sheets, and voids. Dark matter is the scaffolding for all of this.
- Dark matter halos are gravitationally bound clumps of dark matter that act as potential wells, pulling in baryonic (ordinary) matter. These halos range from the scale of dwarf galaxies up to massive galaxy clusters ( to ).
- Filaments are elongated bridges of dark matter and galaxies connecting halos to one another. They're the most visually striking feature of the cosmic web in simulations.
- Sheets (sometimes called walls) are flattened structures that form where multiple filaments converge. The CfA Great Wall was one of the first observed examples.
- Voids are the vast, underdense regions between filaments and sheets, typically tens of megaparsecs across. They're not completely empty, but their density is well below the cosmic mean.
Galaxy clusters sit at the densest nodes of this web, right where several filaments intersect.
Hierarchical Clustering Process
Structure in the universe forms bottom-up, from small scales to large. This is called hierarchical clustering, and it follows a clear sequence:
- In the very early universe, quantum fluctuations during inflation seed tiny density perturbations in the nearly uniform matter distribution.
- After recombination, regions that are slightly overdense begin to gravitationally attract surrounding matter.
- The smallest overdensities collapse first, forming low-mass dark matter halos.
- These small halos then merge with each other and accrete additional matter, building progressively larger structures over cosmic time.
- Galaxy groups merge into clusters. Clusters collect along filaments. The largest structures, superclusters, assemble last and may not even be fully virialized today.
This bottom-up timeline is a direct prediction of cold dark matter (CDM) models, where dark matter particles move slowly enough that small-scale perturbations survive and collapse before large-scale ones.
Press-Schechter Formalism
The Press-Schechter formalism gives you a way to predict how many dark matter halos of a given mass exist at a given redshift. It connects the statistics of the initial density field to the population of collapsed objects.
The key assumptions and logic:
- The primordial density fluctuations are drawn from a Gaussian random field, fully characterized by their variance , which depends on the mass scale and redshift .
- A region collapses into a halo when its smoothed overdensity exceeds a critical threshold (derived from the spherical collapse model).
- By computing the fraction of the density field above at each smoothing scale, you get the fraction of mass locked in halos of that mass.
The resulting halo mass function is:
Here is the comoving number density of halos per unit mass, is the mean matter density, and the exponential suppression at high mass reflects the rarity of extreme overdensities.
A known issue: the original Press-Schechter derivation undercounts halos by a factor of 2 (it only accounts for overdense regions at a single smoothing scale, missing matter that first crosses at a different scale). The factor of is inserted as a correction, and the extended Press-Schechter (excursion set) approach by Bond et al. provides a more rigorous derivation. Modern refinements like the Sheth-Tormen mass function improve agreement with N-body simulations, especially at the high-mass end.

Statistical Measures of Structure
Correlation Function Analysis
The two-point correlation function is the most fundamental clustering statistic. It quantifies the excess probability, relative to a random distribution, of finding a pair of objects separated by distance .
- If , objects are more clustered than random at that separation.
- If , the distribution is indistinguishable from random.
- If , objects are anti-correlated (more uniformly spaced than random).
For galaxies, the correlation function is well-described by a power law over a broad range of scales:
where is the correlation length (the scale at which , roughly 5–8 Mpc for typical galaxy samples) and . The power-law form breaks down at very small and very large separations, but it's a useful approximation across the range where most clustering measurements are made.
Power Spectrum and Fourier Analysis
The matter power spectrum is the Fourier-space counterpart of the correlation function. It tells you the amplitude of density fluctuations as a function of wavenumber (where larger corresponds to smaller spatial scales).
Formally, and form a Fourier transform pair:
The shape of encodes a wealth of physics:
- On large scales (small ), the power spectrum retains the nearly scale-invariant form ( with ) set by inflation.
- The spectrum turns over at a scale corresponding to the horizon size at matter-radiation equality. Modes that entered the horizon during radiation domination had their growth suppressed, producing the characteristic bend.
- The transfer function captures all of this scale-dependent processing, so .
Galaxy surveys like SDSS and DESI measure the galaxy power spectrum, which must then be related to the underlying matter power spectrum through the galaxy bias (discussed below).

Baryon Acoustic Oscillations
Before recombination (), baryons and photons were tightly coupled into a single fluid. Overdensities in this fluid launched spherical sound waves that propagated outward. At recombination, photons decoupled and the sound waves froze in place. The distance each wave traveled, the sound horizon, is about 150 Mpc (comoving).
This imprints a preferred separation scale in the matter distribution:
- In the CMB, it appears as the acoustic peaks in the angular power spectrum.
- In the galaxy distribution, it shows up as a subtle bump in near 150 Mpc, or equivalently as oscillatory wiggles in .
Because the sound horizon can be calculated precisely from well-understood early-universe physics, BAO serve as a standard ruler. By measuring the apparent angular and radial size of this feature at different redshifts, you can map out the expansion history and the angular diameter distance . This makes BAO one of the cleanest probes for constraining dark energy parameters, including its equation of state .
BAO measurements from surveys like BOSS and DESI are independent of (and complementary to) Type Ia supernovae and CMB constraints, which is why they carry so much weight in modern precision cosmology.
Observational Probes
Redshift Surveys and Galaxy Mapping
Redshift surveys are the primary tool for mapping the three-dimensional distribution of galaxies. The procedure is straightforward: measure each galaxy's angular position on the sky and its redshift, then convert the redshift to a distance (using an assumed cosmology) to build a 3D map.
Major surveys and their contributions:
- 2dF Galaxy Redshift Survey: ~220,000 galaxies; provided early precise measurements of the galaxy power spectrum and confirmed the low matter density ().
- Sloan Digital Sky Survey (SDSS): over a million galaxy redshifts; first clear detection of the BAO feature in the galaxy correlation function (Eisenstein et al. 2005).
- DESI (ongoing): targeting ~40 million galaxies and quasars to map expansion history with percent-level BAO precision out to .
One subtlety: galaxy maps are constructed in redshift space, not real space. Peculiar velocities (bulk motions on top of the Hubble flow) distort the apparent positions of galaxies along the line of sight. These redshift-space distortions (RSDs) compress structures on large scales (Kaiser effect) and elongate them on small scales (Fingers of God). Rather than being a nuisance, RSDs are actually useful because they encode information about the growth rate of structure, , which tests general relativity on cosmological scales.
Biased Galaxy Formation and Tracer Populations
Galaxies don't perfectly trace the underlying dark matter distribution. They form preferentially in the densest environments, a phenomenon called galaxy bias.
The simplest model is linear bias: , where is the galaxy overdensity, is the matter overdensity, and is the bias parameter. This means the galaxy power spectrum is related to the matter power spectrum by .
Different galaxy populations have different bias values:
- Luminous red galaxies and massive ellipticals have , meaning they're found preferentially in overdense regions.
- Blue star-forming galaxies are less biased ( closer to 1), tracing the matter field more faithfully.
- Quasars at high redshift can have .
The halo occupation distribution (HOD) framework provides a more detailed picture. It specifies the probability that a halo of mass hosts galaxies of a given type. HOD modeling connects galaxy clustering measurements to the underlying halo mass function and halo clustering, bridging the gap between what you observe (galaxies) and what the theory predicts (dark matter).
Weak gravitational lensing offers a way around the bias problem entirely. Because lensing responds to all matter (dark and baryonic) along the line of sight, it provides an unbiased measurement of the total matter distribution. Cross-correlating galaxy positions with weak lensing shear maps (galaxy-galaxy lensing) is now a standard technique for calibrating galaxy bias and constraining cosmological parameters simultaneously.