Fiveable

🔀Stochastic Processes Unit 2 Review

QR code for Stochastic Processes practice questions

2.2 Continuous probability distributions

2.2 Continuous probability distributions

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Continuous probability distributions model random variables that can take any value within a range, rather than just isolated points. They form the backbone of stochastic process modeling, where quantities like waiting times, signal noise, and particle positions vary continuously. This guide covers the key properties, common distributions, transformations, joint distributions, order statistics, and the limit theorems that tie everything together.

Properties of continuous distributions

A continuous random variable can take on uncountably many values across an interval (or the entire real line). Because of this, you can't assign nonzero probability to individual points. Instead, probabilities come from integrating over intervals, and the machinery for doing that involves PDFs, CDFs, and their associated summary statistics.

Probability density functions

A probability density function (PDF) fX(x)f_X(x) describes the relative likelihood of a continuous random variable near a particular value. Two properties must hold:

  • fX(x)0f_X(x) \geq 0 for all xx
  • fX(x)dx=1\int_{-\infty}^{\infty} f_X(x)\,dx = 1

The probability that XX falls in an interval [a,b][a, b] is the area under the curve:

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_a^b f_X(x)\,dx

Note that fX(x)f_X(x) itself is not a probability and can exceed 1 at specific points. Only the integral over an interval gives a probability.

Cumulative distribution functions

The cumulative distribution function (CDF) gives the probability that XX is at most some value xx:

FX(x)=P(Xx)=xfX(t)dtF_X(x) = P(X \leq x) = \int_{-\infty}^{x} f_X(t)\,dt

Key properties:

  • FXF_X is non-decreasing, right-continuous, with limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limxFX(x)=1\lim_{x \to \infty} F_X(x) = 1
  • The PDF is recovered by differentiation: fX(x)=ddxFX(x)f_X(x) = \frac{d}{dx}F_X(x) wherever the derivative exists
  • Interval probabilities follow directly: P(a<Xb)=FX(b)FX(a)P(a < X \leq b) = F_X(b) - F_X(a)

Expected value and variance

The expected value (mean) of a continuous random variable weights each value by its density:

E[X]=xfX(x)dxE[X] = \int_{-\infty}^{\infty} x\,f_X(x)\,dx

The variance measures spread around the mean:

Var(X)=E[(Xμ)2]=(xμ)2fX(x)dx\text{Var}(X) = E[(X - \mu)^2] = \int_{-\infty}^{\infty}(x - \mu)^2 f_X(x)\,dx

A useful computational shortcut is the alternate form Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2, which often simplifies integration.

Moment generating functions

The moment generating function (MGF) of XX is defined as:

MX(t)=E[etX]=etxfX(x)dxM_X(t) = E[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} f_X(x)\,dx

provided this expectation exists in a neighborhood of t=0t = 0. MGFs are powerful for two reasons:

  • Extracting moments: The nnth moment is E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0), i.e., the nnth derivative evaluated at t=0t = 0. So E[X]=MX(0)E[X] = M_X'(0) and E[X2]=MX(0)E[X^2] = M_X''(0).
  • Sums of independent variables: If XX and YY are independent, MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t). This makes MGFs a clean tool for finding the distribution of sums.

If two random variables share the same MGF (and it exists in a neighborhood of zero), they have the same distribution. This uniqueness property is what makes MGFs so useful for identification.

Common continuous distributions

Several distributions appear repeatedly in stochastic processes. Each one models a different type of random phenomenon, and knowing their parameters and properties is essential.

Uniform distribution

The uniform distribution on [a,b][a, b] assigns equal density to every point in the interval:

fX(x)=1ba,axbf_X(x) = \frac{1}{b - a}, \quad a \leq x \leq b

  • Mean: E[X]=a+b2E[X] = \frac{a + b}{2}
  • Variance: Var(X)=(ba)212\text{Var}(X) = \frac{(b-a)^2}{12}

This distribution is the natural model when you have no reason to favor any value over another within a range. A classic example: if a bus arrives every 20 minutes and you show up at a random time, your waiting time is Uniform(0,20)\text{Uniform}(0, 20).

Normal distribution

The normal (Gaussian) distribution with mean μ\mu and variance σ2\sigma^2 has PDF:

fX(x)=1σ2πexp((xμ)22σ2)f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

The standard normal is the special case μ=0\mu = 0, σ=1\sigma = 1, often denoted ZZ. Any normal variable can be standardized: Z=XμσZ = \frac{X - \mu}{\sigma}.

The normal distribution dominates applied probability for a deep reason: the central limit theorem guarantees that sums of many independent random variables converge to it, regardless of the original distribution. This is why measurement errors, aggregate biological traits, and financial log-returns are often modeled as normal.

Exponential distribution

The exponential distribution with rate λ>0\lambda > 0 has PDF:

fX(x)=λeλx,x0f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0

  • Mean: E[X]=1λE[X] = \frac{1}{\lambda}
  • Variance: Var(X)=1λ2\text{Var}(X) = \frac{1}{\lambda^2}

It models the waiting time between events in a Poisson process. Its defining property is memorylessness: P(X>s+tX>s)=P(X>t)P(X > s + t \mid X > s) = P(X > t). The exponential distribution is the only continuous distribution with this property. This makes it the natural model for lifetimes of components that don't age or wear out.

Probability density functions, Introduction to Normal Random Variables | Concepts in Statistics

Gamma distribution

The gamma distribution Gamma(α,β)\text{Gamma}(\alpha, \beta) generalizes the exponential by adding a shape parameter α>0\alpha > 0 alongside the rate parameter β>0\beta > 0:

fX(x)=βαΓ(α)xα1eβx,x>0f_X(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0

  • Mean: αβ\frac{\alpha}{\beta}
  • Variance: αβ2\frac{\alpha}{\beta^2}

When α\alpha is a positive integer nn, the Gamma(n,β)\text{Gamma}(n, \beta) distribution is exactly the distribution of the sum of nn independent Exp(β)\text{Exp}(\beta) random variables. This makes it a natural model for the total waiting time until the nnth event in a Poisson process. The special case α=1\alpha = 1 recovers the exponential distribution.

Beta distribution

The beta distribution Beta(α,β)\text{Beta}(\alpha, \beta) is defined on [0,1][0, 1] with PDF:

fX(x)=Γ(α+β)Γ(α)Γ(β)xα1(1x)β1,0<x<1f_X(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1}(1 - x)^{\beta - 1}, \quad 0 < x < 1

  • Mean: αα+β\frac{\alpha}{\alpha + \beta}
  • Variance: αβ(α+β)2(α+β+1)\frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}

The beta distribution is extremely flexible. Depending on α\alpha and β\beta, it can be uniform (α=β=1\alpha = \beta = 1), U-shaped, J-shaped, or bell-shaped. It's the standard choice for modeling random proportions or probabilities, and it serves as the conjugate prior for the binomial likelihood in Bayesian inference.

Transformations of random variables

Transformations let you derive the distribution of a new random variable defined as a function of an existing one. This is a core technique you'll use constantly in stochastic processes.

Distribution of functions of random variables

If Y=g(X)Y = g(X) where gg is a monotone, differentiable function with inverse g1g^{-1}, the PDF of YY follows from the change-of-variables formula:

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy}g^{-1}(y)\right|

The absolute value of the derivative of the inverse function acts as a "Jacobian" that accounts for how gg stretches or compresses the probability density.

Steps for applying the formula:

  1. Write Y=g(X)Y = g(X) and solve for X=g1(Y)X = g^{-1}(Y)
  2. Compute ddyg1(y)\frac{d}{dy}g^{-1}(y)
  3. Take the absolute value
  4. Substitute into the formula, and determine the new support (range of valid yy values)

If gg is not monotone, you need to split the domain into regions where it is monotone and sum the contributions from each branch.

Convolutions and sums of random variables

When XX and YY are independent continuous random variables, the PDF of Z=X+YZ = X + Y is given by the convolution integral:

fZ(z)=fX(x)fY(zx)dxf_Z(z) = \int_{-\infty}^{\infty} f_X(x)\,f_Y(z - x)\,dx

This integral "slides" one density across the other and accumulates the overlap. A few important results:

  • The sum of two independent normals N(μ1,σ12)N(\mu_1, \sigma_1^2) and N(μ2,σ22)N(\mu_2, \sigma_2^2) is N(μ1+μ2,σ12+σ22)N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)
  • The sum of independent Gamma(α1,β)\text{Gamma}(\alpha_1, \beta) and Gamma(α2,β)\text{Gamma}(\alpha_2, \beta) (same rate) is Gamma(α1+α2,β)\text{Gamma}(\alpha_1 + \alpha_2, \beta)

In practice, MGFs often provide a faster route than direct convolution: multiply the MGFs, then identify the resulting distribution.

Product distribution

For the product Z=XYZ = XY of two independent continuous random variables, the PDF is:

fZ(z)=1xfX(x)fY ⁣(zx)dxf_Z(z) = \int_{-\infty}^{\infty} \frac{1}{|x|}\,f_X(x)\,f_Y\!\left(\frac{z}{x}\right)dx

This formula comes from the same change-of-variables logic, with the 1x\frac{1}{|x|} factor acting as the Jacobian. Product distributions arise in contexts like modeling the area of a rectangle with random dimensions, or in signal processing where a signal is multiplied by a random gain.

Joint continuous distributions

Joint distributions describe the simultaneous behavior of two or more continuous random variables. They capture not just individual behavior but also the dependence structure between variables.

Joint probability density functions

The joint PDF fX,Y(x,y)f_{X,Y}(x,y) of two continuous random variables must satisfy:

  • fX,Y(x,y)0f_{X,Y}(x,y) \geq 0 for all (x,y)(x,y)
  • fX,Y(x,y)dxdy=1\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx\,dy = 1

The probability that (X,Y)(X,Y) falls in a region AA is:

P((X,Y)A)=AfX,Y(x,y)dxdyP((X,Y) \in A) = \iint_A f_{X,Y}(x,y)\,dx\,dy

Two random variables are independent if and only if their joint PDF factors: fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) for all (x,y)(x,y).

Probability density functions, The Exponential Distribution | Introduction to Statistics

Marginal and conditional distributions

Marginal distributions recover the distribution of a single variable by integrating out the other:

fX(x)=fX,Y(x,y)dyfY(y)=fX,Y(x,y)dxf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy \qquad f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx

Conditional distributions describe one variable given a fixed value of the other:

fYX(yx)=fX,Y(x,y)fX(x),provided fX(x)>0f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0

This is the continuous analog of conditional probability. The conditional expectation E[YX=x]=yfYX(yx)dyE[Y \mid X = x] = \int_{-\infty}^{\infty} y\,f_{Y|X}(y|x)\,dy is particularly important in stochastic processes, where it forms the basis of filtering and prediction.

Covariance and correlation

Covariance measures the linear co-movement of two random variables:

Cov(X,Y)=E[(XμX)(YμY)]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - E[X]E[Y]

The computational form E[XY]E[X]E[Y]E[XY] - E[X]E[Y] is usually easier to evaluate than the definition.

The correlation coefficient normalizes covariance to the range [1,1][-1, 1]:

ρX,Y=Cov(X,Y)σXσY\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

  • ρ=1\rho = 1 or ρ=1\rho = -1: perfect linear relationship
  • ρ=0\rho = 0: no linear relationship (but the variables may still be dependent in a nonlinear way)
  • If XX and YY are independent, then Cov(X,Y)=0\text{Cov}(X,Y) = 0. The converse is not true in general.

Order statistics

Order statistics deal with the sorted values of a random sample. If you draw nn independent observations from the same continuous distribution and sort them, the kkth smallest value X(k)X_{(k)} is the kkth order statistic.

Distribution of the kth order statistic

Given nn i.i.d. continuous random variables with PDF fX(x)f_X(x) and CDF FX(x)F_X(x), the PDF of the kkth order statistic X(k)X_{(k)} is:

fX(k)(x)=n!(k1)!(nk)![FX(x)]k1[1FX(x)]nkfX(x)f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!}\,[F_X(x)]^{k-1}[1 - F_X(x)]^{n-k}f_X(x)

The intuition: for X(k)X_{(k)} to have a density at xx, exactly k1k-1 observations must fall below xx, one observation must be at xx, and nkn-k must fall above. The combinatorial prefactor counts the ways to assign observations to these three groups.

Two special cases come up constantly:

  • Minimum (k=1k = 1): fX(1)(x)=n[1FX(x)]n1fX(x)f_{X_{(1)}}(x) = n[1 - F_X(x)]^{n-1}f_X(x)
  • Maximum (k=nk = n): fX(n)(x)=n[FX(x)]n1fX(x)f_{X_{(n)}}(x) = n[F_X(x)]^{n-1}f_X(x)

The CDF of the kkth order statistic is:

FX(k)(x)=i=kn(ni)[FX(x)]i[1FX(x)]niF_{X_{(k)}}(x) = \sum_{i=k}^{n} \binom{n}{i}[F_X(x)]^i[1 - F_X(x)]^{n-i}

Extreme value distributions

When nn grows large, the distribution of the sample maximum (after appropriate centering and scaling) converges to one of three extreme value distributions, classified by the tail behavior of the parent distribution:

  • Gumbel (Type I): For distributions with exponentially decaying tails (e.g., normal, exponential). The CDF is exp(e(xμ)/β)\exp(-e^{-(x-\mu)/\beta}).
  • Fréchet (Type II): For distributions with heavy (polynomial) tails (e.g., Pareto, Cauchy).
  • Weibull (Type III): For distributions with a finite upper endpoint (e.g., uniform, beta).

These three families are unified by the Generalized Extreme Value (GEV) distribution, parameterized by a shape parameter ξ\xi that determines which type applies. Extreme value theory is central to risk modeling in finance, hydrology, and engineering, where you need to estimate the probability of rare, large events.

Limit theorems

Limit theorems describe what happens to sums and averages of random variables as the sample size grows. They provide the theoretical justification for much of statistical inference.

Law of large numbers for continuous variables

The law of large numbers (LLN) says that the sample mean converges to the population mean as nn grows. Formally, for i.i.d. random variables X1,X2,X_1, X_2, \ldots with mean μ\mu:

Xˉn=1ni=1nXiPμas n\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{P} \mu \quad \text{as } n \to \infty

The weak LLN gives convergence in probability (shown above). The strong LLN gives almost sure convergence, meaning P(limnXˉn=μ)=1P(\lim_{n\to\infty} \bar{X}_n = \mu) = 1. The strong version requires E[X]<E[|X|] < \infty; the weak version can hold under slightly weaker conditions.

The LLN justifies using sample averages as estimators and underpins simulation methods like Monte Carlo estimation.

Central limit theorem for continuous variables

The central limit theorem (CLT) is arguably the most important result in probability. For i.i.d. random variables with mean μ\mu and finite variance σ2\sigma^2:

Zn=Xˉnμσ/ndN(0,1)as nZ_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty

Equivalently, for large nn, the sum Sn=i=1nXiS_n = \sum_{i=1}^n X_i is approximately N(nμ,nσ2)N(n\mu, n\sigma^2).

The CLT holds regardless of the shape of the original distribution, as long as the variance is finite. This is why normal-based confidence intervals and hypothesis tests work even when the underlying data aren't normal, provided the sample size is large enough. As a rough guideline, n30n \geq 30 is often sufficient for moderately skewed distributions, but highly skewed or heavy-tailed distributions may require larger samples.