Random variables and probability distributions are the foundation of actuarial mathematics. They give you the tools to assign numbers to uncertain outcomes and then describe how likely those outcomes are, which is exactly what actuaries need to quantify risk in insurance and finance.
This section covers the types of random variables, how probability distributions work (including joint, marginal, and conditional distributions), the most common discrete and continuous distributions, moments, transformations, and limit theorems.
Types of random variables
A random variable assigns a numerical value to each outcome of a random experiment. The type of random variable you're working with determines which mathematical tools you'll use, so getting this distinction right matters.
Discrete vs continuous
Discrete random variables take on countable values, like integers or elements of a finite set. Think of things you can count: the number of claims filed in a month, the number of policies sold, or the number of defaults in a portfolio.
Continuous random variables can take any value within an interval. These describe things you measure rather than count: the dollar amount of a claim, the time until a policy lapses, or an insured's age at death.
The distinction matters because discrete and continuous variables use different mathematical machinery (summation vs. integration), and mixing them up will lead to errors.
Probability mass functions
A probability mass function (PMF) describes the probability distribution of a discrete random variable. It tells you the probability that the variable equals each possible value.
- Notation: , where is the random variable and is a specific value
- Every probability must be non-negative, and the sum across all possible values must equal 1:
- The binomial and Poisson distributions are common examples of distributions defined by PMFs
Probability density functions
A probability density function (PDF) describes the distribution of a continuous random variable. Unlike a PMF, a PDF does not give you the probability of a single point. Instead, it gives the relative likelihood at each value.
- Notation:
- To find the probability that falls in an interval, you integrate:
- The total area under the PDF must equal 1:
- The normal and exponential distributions are common examples
A key point that trips people up: for a continuous random variable, for any single value . Probabilities only make sense over intervals.
Probability distributions
Probability distributions are mathematical functions that describe how likely different outcomes are for a random variable. Actuaries rely on them to model everything from claim frequencies to loss severities.
Cumulative distribution functions
The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a specific value:
CDFs work for both discrete and continuous random variables:
- Discrete:
- Continuous:
Every CDF has three properties: it's non-decreasing, , and . CDFs are especially useful for computing probabilities over intervals and for finding quantiles (percentiles) of a distribution.
Joint probability distributions
Joint probability distributions describe the probability of two or more random variables taking on values simultaneously.
- For discrete variables, the joint PMF is
- For continuous variables, the joint PDF is , and probabilities are found by integrating over a region
Joint distributions let you model dependent risks. For example, you might use a joint distribution to capture the relationship between claim frequency and claim severity for an insurance policy. The joint CDF is .
Marginal probability distributions
Marginal distributions are obtained from a joint distribution by "summing out" or "integrating out" the other variables:
- Discrete:
- Continuous:
The result is the distribution of a single variable on its own, ignoring the others. For instance, if you have the joint distribution of claim frequency and severity, the marginal distribution of frequency tells you about frequency alone.
Conditional probability distributions
Conditional distributions describe the probability of one variable given a known value of another:
- Discrete:
- Continuous:
These are essential for updating your model when new information arrives. For example, the distribution of claim severity given that a claim has occurred is a conditional distribution. Conditional distributions connect to joint and marginal distributions through Bayes' theorem.
Common discrete distributions
Discrete distributions model counted quantities. Each distribution below has specific assumptions, so choosing the right one depends on the structure of the problem.
Bernoulli distribution
Models a single trial with two outcomes: success (1) or failure (0).
- Notation: , where is the probability of success
- PMF: for
- Mean: , Variance:
Use this for binary events, such as whether a policyholder files a claim or not.
Binomial distribution
Models the number of successes in independent Bernoulli trials, each with the same success probability .
- Notation:
- PMF: for
- Mean: , Variance:
Use this when you have a fixed number of independent trials. For example, out of 100 policies, how many will result in a claim this year?

Poisson distribution
Models the count of events occurring in a fixed interval of time or space, where events happen independently at a constant average rate.
- Notation: , where is the average number of events per interval
- PMF: for
- Mean: , Variance:
The Poisson is the workhorse distribution for claim frequency modeling. A notable property: the mean equals the variance. If your data shows the variance significantly exceeding the mean, the Poisson may not be the right fit (consider the negative binomial instead).
Geometric distribution
Models the number of trials until the first success in a sequence of independent Bernoulli trials.
- Notation:
- PMF: for
- Mean: , Variance:
Watch out: some textbooks define the geometric as the number of failures before the first success, which shifts the support to and changes the PMF to . Always check which convention is being used.
Negative binomial distribution
Generalizes the geometric distribution. Models the number of failures before achieving successes.
- Notation:
- PMF: for
- Mean: , Variance:
In actuarial work, the negative binomial is often used as an alternative to the Poisson for claim counts when overdispersion is present (i.e., the variance exceeds the mean).
Hypergeometric distribution
Models the number of successes in draws from a finite population of size containing successes, without replacement.
- Notation:
- PMF: for
The key difference from the binomial: sampling is done without replacement, so trials are not independent. Use this when the population is small enough that removing items changes the probabilities meaningfully. As grows large relative to , the hypergeometric approaches the binomial.
Common continuous distributions
Continuous distributions model measured quantities like claim amounts, durations, and financial returns.
Uniform distribution
Every value in the interval is equally likely.
- Notation:
- PDF: for
- Mean: , Variance:
Often used when you have no prior information about which values are more likely, or as a building block in simulation (since many random number generators produce uniform variates that get transformed into other distributions).
Normal distribution
The symmetric, bell-shaped distribution that appears throughout statistics and actuarial science, largely because of the central limit theorem.
- Notation: , where is the mean and is the variance
- PDF: for
The standard normal has and . Any normal variable can be standardized: . Used to model aggregate losses (via CLT), financial returns, and as an approximation for many other distributions when sample sizes are large.
Exponential distribution
Models the waiting time until the next event in a Poisson process.
- Notation: , where is the rate parameter
- PDF: for
- Mean: , Variance:
The exponential distribution has the memoryless property: . This means the probability of waiting an additional units doesn't depend on how long you've already waited. It's the only continuous distribution with this property.
Gamma distribution
Generalizes the exponential. When is a positive integer, it models the waiting time until the -th event in a Poisson process.
- Notation: , where is the shape and is the rate
- PDF: for
- Mean: , Variance:
Note that is the gamma function, which generalizes the factorial: for positive integers. The gamma distribution is widely used for modeling claim amounts and in credibility theory. When , it reduces to the exponential.
Beta distribution
Models a random variable constrained to the interval , making it natural for proportions and probabilities.
- Notation:
- PDF: for
- Mean: , Variance:
Here is the beta function. The beta distribution is very flexible: depending on the parameters, it can be symmetric, left-skewed, right-skewed, U-shaped, or uniform (when ). Actuaries use it for loss ratios and as a prior distribution in Bayesian analysis.
Lognormal distribution
If follows a normal distribution, then follows a lognormal distribution. This means is always positive and right-skewed.
- Notation: , where and are the mean and variance of
- PDF: for
- Mean: , Variance:
The lognormal is one of the most common distributions for modeling individual claim amounts because claims are positive and their distribution is typically right-skewed (many small claims, few very large ones).

Moments of distributions
Moments are numerical summaries that capture the shape and behavior of a distribution. The first few moments tell you about the center, spread, asymmetry, and tail behavior.
Expected value
The expected value (mean) is the probability-weighted average of all possible values.
- Notation: or
- Discrete:
- Continuous:
The expected value is linear: . Actuaries use it to calculate fair premiums, expected claim costs, and expected profits.
Variance and standard deviation
Variance measures how spread out the distribution is around the mean.
- Notation: or
- Computed as:
The second formula () is usually easier to compute. The standard deviation has the same units as , making it more interpretable.
Variance is used to assess volatility of claim amounts, set risk margins, and determine capital requirements.
Skewness and kurtosis
Skewness measures asymmetry:
- : right-skewed (long right tail, common for claim amounts)
- : left-skewed
- : symmetric
Excess kurtosis measures tail heaviness relative to the normal distribution:
The "" makes the normal distribution the baseline (). Positive excess kurtosis means heavier tails than normal, which signals a higher probability of extreme outcomes. This is critical for actuaries assessing catastrophic risk.
Moment-generating functions
The moment-generating function (MGF) encodes all the moments of a distribution in a single function:
To extract the -th moment, take the -th derivative and evaluate at :
MGFs are powerful for three reasons:
- If two random variables have the same MGF (and it exists in a neighborhood of 0), they have the same distribution
- For independent random variables, , which simplifies working with sums
- They provide a clean way to prove limit theorems
Not every distribution has an MGF (the Cauchy distribution, for example, does not). When the MGF doesn't exist, the characteristic function serves as an alternative.
Transformations of random variables
Transformations create new random variables by applying functions to existing ones. This is how you move between distributions and model more complex relationships.
Linear transformations
A linear transformation takes the form , where and are constants.
- Mean:
- Variance:
Notice that adding a constant shifts the mean but doesn't affect the variance. Multiplying by scales both the mean and the standard deviation (by ), and scales the variance by .
Common uses: standardizing a variable (), converting units, or adjusting claim amounts for inflation.
Functions of random variables
For a general transformation , you need to find the distribution of .
The CDF method works as follows:
-
Start with
-
Rewrite the event in terms of
-
If is monotonically increasing:
-
If is monotonically decreasing:
-
Differentiate the CDF to get the PDF of
For a strictly monotonic, differentiable , the PDF can be found directly:
This technique is how you derive, for example, that exponentiating a normal variable gives a lognormal variable.
Convolutions and sums of random variables
When and are independent, the distribution of is found through convolution:
- Discrete:
- Continuous:
In practice, convolution integrals can be difficult to compute directly. The MGF approach is often easier: since for independent variables, you can multiply the MGFs and then identify the resulting distribution.
For example, the sum of independent Poisson random variables with parameters and is Poisson with parameter . You can verify this quickly using MGFs.
Convolutions are central to modeling aggregate losses, where total claims equal the sum of individual claim amounts.
Limit theorems
Limit theorems describe what happens as you work with larger and larger samples. They justify many of the approximations actuaries use in practice.
Law of large numbers
The law of large numbers (LLN) says that the sample mean converges to the population mean as the sample size grows.
For i.i.d. random variables with mean :
There are two versions:
- Weak LLN: converges to in probability
- Strong LLN: converges to almost surely (a stronger guarantee)
The LLN is the theoretical foundation of insurance: with a large enough pool of policyholders, the average claim per policy becomes predictable. This is why insurers can set stable premiums despite individual claims being random.
Central limit theorem
The central limit theorem (CLT) states that the standardized sum (or average) of a large number of i.i.d. random variables is approximately normally distributed, regardless of the original distribution.
For i.i.d. random variables with mean and variance :
Equivalently, for the sum :
The CLT is why the normal distribution appears so frequently in practice. Actuaries use it to approximate the distribution of aggregate claims, construct confidence intervals, and perform hypothesis tests. As a rough guideline, the approximation tends to work well for , though highly skewed distributions may require larger samples.