Continuous probability distributions model random variables that can take any value within a range, rather than just isolated points. They form the backbone of stochastic process modeling, where quantities like waiting times, signal noise, and particle positions vary continuously. This guide covers the key properties, common distributions, transformations, joint distributions, order statistics, and the limit theorems that tie everything together.
Properties of continuous distributions
A continuous random variable can take on uncountably many values across an interval (or the entire real line). Because of this, you can't assign nonzero probability to individual points. Instead, probabilities come from integrating over intervals, and the machinery for doing that involves PDFs, CDFs, and their associated summary statistics.
Probability density functions
A probability density function (PDF) describes the relative likelihood of a continuous random variable near a particular value. Two properties must hold:
- for all
The probability that falls in an interval is the area under the curve:
Note that itself is not a probability and can exceed 1 at specific points. Only the integral over an interval gives a probability.
Cumulative distribution functions
The cumulative distribution function (CDF) gives the probability that is at most some value :
Key properties:
- is non-decreasing, right-continuous, with and
- The PDF is recovered by differentiation: wherever the derivative exists
- Interval probabilities follow directly:
Expected value and variance
The expected value (mean) of a continuous random variable weights each value by its density:
The variance measures spread around the mean:
A useful computational shortcut is the alternate form , which often simplifies integration.
Moment generating functions
The moment generating function (MGF) of is defined as:
provided this expectation exists in a neighborhood of . MGFs are powerful for two reasons:
- Extracting moments: The th moment is , i.e., the th derivative evaluated at . So and .
- Sums of independent variables: If and are independent, . This makes MGFs a clean tool for finding the distribution of sums.
If two random variables share the same MGF (and it exists in a neighborhood of zero), they have the same distribution. This uniqueness property is what makes MGFs so useful for identification.
Common continuous distributions
Several distributions appear repeatedly in stochastic processes. Each one models a different type of random phenomenon, and knowing their parameters and properties is essential.
Uniform distribution
The uniform distribution on assigns equal density to every point in the interval:
- Mean:
- Variance:
This distribution is the natural model when you have no reason to favor any value over another within a range. A classic example: if a bus arrives every 20 minutes and you show up at a random time, your waiting time is .
Normal distribution
The normal (Gaussian) distribution with mean and variance has PDF:
The standard normal is the special case , , often denoted . Any normal variable can be standardized: .
The normal distribution dominates applied probability for a deep reason: the central limit theorem guarantees that sums of many independent random variables converge to it, regardless of the original distribution. This is why measurement errors, aggregate biological traits, and financial log-returns are often modeled as normal.
Exponential distribution
The exponential distribution with rate has PDF:
- Mean:
- Variance:
It models the waiting time between events in a Poisson process. Its defining property is memorylessness: . The exponential distribution is the only continuous distribution with this property. This makes it the natural model for lifetimes of components that don't age or wear out.

Gamma distribution
The gamma distribution generalizes the exponential by adding a shape parameter alongside the rate parameter :
- Mean:
- Variance:
When is a positive integer , the distribution is exactly the distribution of the sum of independent random variables. This makes it a natural model for the total waiting time until the th event in a Poisson process. The special case recovers the exponential distribution.
Beta distribution
The beta distribution is defined on with PDF:
- Mean:
- Variance:
The beta distribution is extremely flexible. Depending on and , it can be uniform (), U-shaped, J-shaped, or bell-shaped. It's the standard choice for modeling random proportions or probabilities, and it serves as the conjugate prior for the binomial likelihood in Bayesian inference.
Transformations of random variables
Transformations let you derive the distribution of a new random variable defined as a function of an existing one. This is a core technique you'll use constantly in stochastic processes.
Distribution of functions of random variables
If where is a monotone, differentiable function with inverse , the PDF of follows from the change-of-variables formula:
The absolute value of the derivative of the inverse function acts as a "Jacobian" that accounts for how stretches or compresses the probability density.
Steps for applying the formula:
- Write and solve for
- Compute
- Take the absolute value
- Substitute into the formula, and determine the new support (range of valid values)
If is not monotone, you need to split the domain into regions where it is monotone and sum the contributions from each branch.
Convolutions and sums of random variables
When and are independent continuous random variables, the PDF of is given by the convolution integral:
This integral "slides" one density across the other and accumulates the overlap. A few important results:
- The sum of two independent normals and is
- The sum of independent and (same rate) is
In practice, MGFs often provide a faster route than direct convolution: multiply the MGFs, then identify the resulting distribution.
Product distribution
For the product of two independent continuous random variables, the PDF is:
This formula comes from the same change-of-variables logic, with the factor acting as the Jacobian. Product distributions arise in contexts like modeling the area of a rectangle with random dimensions, or in signal processing where a signal is multiplied by a random gain.
Joint continuous distributions
Joint distributions describe the simultaneous behavior of two or more continuous random variables. They capture not just individual behavior but also the dependence structure between variables.
Joint probability density functions
The joint PDF of two continuous random variables must satisfy:
- for all
The probability that falls in a region is:
Two random variables are independent if and only if their joint PDF factors: for all .

Marginal and conditional distributions
Marginal distributions recover the distribution of a single variable by integrating out the other:
Conditional distributions describe one variable given a fixed value of the other:
This is the continuous analog of conditional probability. The conditional expectation is particularly important in stochastic processes, where it forms the basis of filtering and prediction.
Covariance and correlation
Covariance measures the linear co-movement of two random variables:
The computational form is usually easier to evaluate than the definition.
The correlation coefficient normalizes covariance to the range :
- or : perfect linear relationship
- : no linear relationship (but the variables may still be dependent in a nonlinear way)
- If and are independent, then . The converse is not true in general.
Order statistics
Order statistics deal with the sorted values of a random sample. If you draw independent observations from the same continuous distribution and sort them, the th smallest value is the th order statistic.
Distribution of the kth order statistic
Given i.i.d. continuous random variables with PDF and CDF , the PDF of the th order statistic is:
The intuition: for to have a density at , exactly observations must fall below , one observation must be at , and must fall above. The combinatorial prefactor counts the ways to assign observations to these three groups.
Two special cases come up constantly:
- Minimum ():
- Maximum ():
The CDF of the th order statistic is:
Extreme value distributions
When grows large, the distribution of the sample maximum (after appropriate centering and scaling) converges to one of three extreme value distributions, classified by the tail behavior of the parent distribution:
- Gumbel (Type I): For distributions with exponentially decaying tails (e.g., normal, exponential). The CDF is .
- Fréchet (Type II): For distributions with heavy (polynomial) tails (e.g., Pareto, Cauchy).
- Weibull (Type III): For distributions with a finite upper endpoint (e.g., uniform, beta).
These three families are unified by the Generalized Extreme Value (GEV) distribution, parameterized by a shape parameter that determines which type applies. Extreme value theory is central to risk modeling in finance, hydrology, and engineering, where you need to estimate the probability of rare, large events.
Limit theorems
Limit theorems describe what happens to sums and averages of random variables as the sample size grows. They provide the theoretical justification for much of statistical inference.
Law of large numbers for continuous variables
The law of large numbers (LLN) says that the sample mean converges to the population mean as grows. Formally, for i.i.d. random variables with mean :
The weak LLN gives convergence in probability (shown above). The strong LLN gives almost sure convergence, meaning . The strong version requires ; the weak version can hold under slightly weaker conditions.
The LLN justifies using sample averages as estimators and underpins simulation methods like Monte Carlo estimation.
Central limit theorem for continuous variables
The central limit theorem (CLT) is arguably the most important result in probability. For i.i.d. random variables with mean and finite variance :
Equivalently, for large , the sum is approximately .
The CLT holds regardless of the shape of the original distribution, as long as the variance is finite. This is why normal-based confidence intervals and hypothesis tests work even when the underlying data aren't normal, provided the sample size is large enough. As a rough guideline, is often sufficient for moderately skewed distributions, but highly skewed or heavy-tailed distributions may require larger samples.