Discrete distributions like Bernoulli, binomial, and Poisson are foundational tools in actuarial math. They model events with countable outcomes, such as whether an insurance claim occurs, how many claims arise from a portfolio, or how often rare events happen in a given time window. Mastering these three distributions and their interrelationships is essential for probability exams and for real actuarial work in risk modeling.

Bernoulli distribution

The Bernoulli distribution models a single trial with exactly two outcomes: success ( $x = 1$ ) or failure ( $x = 0$ ). Think of it as the simplest possible random experiment. In actuarial contexts, this could be whether a single policyholder files a claim or not.

It also serves as the building block for the binomial distribution, since a binomial random variable is just a sum of independent Bernoulli trials.

Probability mass function

The PMF of a Bernoulli random variable $X$ is:

$P(X = x) = p^x(1-p)^{1-x} \quad \text{for } x \in \{0, 1\}$

$p$ is the probability of success
$1 - p$ (often written $q$ ) is the probability of failure

When $x = 1$ , this simplifies to $P(X=1) = p$ . When $x = 0$ , it gives $P(X=0) = 1-p$ . The compact formula just combines both cases into one expression.

Mean and variance

Mean: $E(X) = p$
Variance: $Var(X) = p(1-p)$
Standard deviation: $\sigma = \sqrt{p(1-p)}$

Notice the variance is maximized when $p = 0.5$ and equals zero when $p = 0$ or $p = 1$ . This makes intuitive sense: there's no uncertainty if the outcome is guaranteed.

Applications of Bernoulli distribution

Whether a single policyholder files a claim (claim/no claim)
Whether a manufactured item passes inspection (defective/non-defective)
Whether a single medical treatment succeeds or fails
Any binary outcome that feeds into a larger binomial model

Binomial distribution

The binomial distribution counts the number of successes in $n$ independent Bernoulli trials, each with the same success probability $p$ . For example, if you have 100 policyholders each with a 3% claim probability, the total number of claims follows a binomial distribution with $n = 100$ and $p = 0.03$ .

Two conditions must hold for the binomial to apply: the trials must be independent, and $p$ must be constant across all trials.

Probability mass function

The PMF of a binomial random variable $X$ is:

$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \quad \text{for } k = 0, 1, 2, \ldots, n$

$n$ is the number of trials
$p$ is the probability of success on each trial
$\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient, counting the number of ways to arrange $k$ successes among $n$ trials

The logic: $p^k(1-p)^{n-k}$ is the probability of one specific sequence with $k$ successes, and $\binom{n}{k}$ accounts for all possible orderings of those successes.

Cumulative distribution function

The CDF of a binomial random variable $X$ is:

$F(x) = P(X \leq x) = \sum_{k=0}^{\lfloor x \rfloor} \binom{n}{k} p^k (1-p)^{n-k}$

This gives the probability of observing $x$ or fewer successes. To find the probability over a range, use:

$P(a \leq X \leq b) = F(b) - F(a-1)$

Note the $a - 1$ , not $a$ . Since $X$ is discrete, you need to include the point $a$ itself.

Mean and variance

Mean: $E(X) = np$
Variance: $Var(X) = np(1-p)$
Standard deviation: $\sigma = \sqrt{np(1-p)}$

These follow directly from the fact that $X$ is a sum of $n$ independent Bernoulli variables, each with mean $p$ and variance $p(1-p)$ .

Moment generating function

The MGF of a binomial random variable $X$ is:

$M_X(t) = (pe^t + 1 - p)^n$

The MGF uniquely determines the distribution. You can extract the $k$ -th moment by computing the $k$ -th derivative of $M_X(t)$ and evaluating at $t = 0$ . The MGF is also useful for proving that sums of independent binomials (with the same $p$ ) remain binomial.

Properties of binomial distribution

Additivity: If $X_1 \sim \text{Bin}(n_1, p)$ and $X_2 \sim \text{Bin}(n_2, p)$ are independent, then $X_1 + X_2 \sim \text{Bin}(n_1 + n_2, p)$ . The success probabilities must be equal for this to work.
Normal approximation: As $n$ grows, the binomial approaches a normal distribution. The standard rule of thumb is that the approximation is reasonable when both $np \geq 5$ and $n(1-p) \geq 5$ . A continuity correction (adjusting by $\pm 0.5$ ) improves accuracy.

Probability mass function, File:Discrete probability distribution illustration.png - Wikimedia Commons

Binomial approximation to hypergeometric

When sampling without replacement from a finite population, the exact distribution is hypergeometric. But if the sample size $n$ is small relative to the population size $N$ (the common guideline is $n < 0.05N$ ), the binomial with $p = K/N$ provides a good approximation, where $K$ is the number of "successes" in the population.

The reasoning: when the population is large enough, removing one item barely changes the composition, so sampling without replacement behaves almost like sampling with replacement.

Applications of binomial distribution

Number of claims filed out of $n$ policies in a portfolio
Number of defective items in a batch of $n$ products
Number of successful treatments out of $n$ patients
Number of wins in a fixed-length series of games

Poisson distribution

The Poisson distribution models the count of events occurring in a fixed interval of time or space, given a known average rate. Unlike the binomial, there's no fixed number of trials; the support is all non-negative integers $\{0, 1, 2, \ldots\}$ . It's characterized by a single parameter $\lambda > 0$ , the average number of events per interval.

In actuarial work, the Poisson is the go-to distribution for claim frequency modeling, especially when individual claim probabilities are small but the exposure is large.

Probability mass function

The PMF of a Poisson random variable $X$ is:

$P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} \quad \text{for } k = 0, 1, 2, \ldots$

$\lambda$ is the average number of events per interval
$e \approx 2.71828$ is Euler's number

For example, if an insurer expects $\lambda = 3$ claims per month, the probability of exactly 5 claims is:

$P(X = 5) = \frac{e^{-3} \cdot 3^5}{5!} = \frac{0.04979 \cdot 243}{120} \approx 0.1008$

Cumulative distribution function

The CDF of a Poisson random variable $X$ is:

$F(x) = P(X \leq x) = \sum_{k=0}^{\lfloor x \rfloor} \frac{e^{-\lambda} \lambda^k}{k!}$

As with the binomial, for a range: $P(a \leq X \leq b) = F(b) - F(a-1)$ .

Mean and variance

Mean: $E(X) = \lambda$
Variance: $Var(X) = \lambda$
Standard deviation: $\sigma = \sqrt{\lambda}$

The fact that the mean equals the variance is a defining characteristic of the Poisson distribution. In practice, if you observe data where the sample variance is much larger or smaller than the sample mean, a Poisson model may not be appropriate. This is called overdispersion (variance > mean) or underdispersion (variance < mean).

Moment generating function

The MGF of a Poisson random variable $X$ is:

$M_X(t) = e^{\lambda(e^t - 1)}$

As with the binomial, the $k$ -th moment is found by taking the $k$ -th derivative of $M_X(t)$ at $t = 0$ . The MGF also makes it straightforward to prove the additivity property below.

Properties of Poisson distribution

Additivity: If $X_1 \sim \text{Poisson}(\lambda_1)$ and $X_2 \sim \text{Poisson}(\lambda_2)$ are independent, then $X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2)$ . You can verify this by multiplying the MGFs.
Limiting case of binomial: The Poisson distribution arises as the limit of $\text{Bin}(n, p)$ when $n \to \infty$ , $p \to 0$ , and $np = \lambda$ stays constant. This is why the Poisson works well for modeling many independent rare events.

Poisson approximation to binomial

When $n$ is large and $p$ is small, computing binomial probabilities directly can be cumbersome. The Poisson with $\lambda = np$ provides a convenient approximation.

Common rules of thumb for when the approximation is adequate:

$n \geq 20$ and $p \leq 0.05$ , or
$n \geq 100$ and $np \leq 10$

Example: Suppose 1,000 policies each have a 0.2% chance of a catastrophic claim. The exact distribution is $\text{Bin}(1000, 0.002)$ , but you can approximate it with $\text{Poisson}(2)$ since $\lambda = 1000 \times 0.002 = 2$ .

Probability mass function, Probability mass functions

Applications of Poisson distribution

Number of insurance claims arriving per month or per year
Number of accidents at an intersection over a given period
Number of natural disasters in a region per decade
Number of customer arrivals at a service counter per hour

Relationships between distributions

Binomial as sum of Bernoullis

If $X_1, X_2, \ldots, X_n$ are independent Bernoulli random variables each with success probability $p$ , then:

$Y = \sum_{i=1}^{n} X_i \sim \text{Bin}(n, p)$

This is the formal connection: the binomial distribution is a generalization of the Bernoulli to multiple trials. It also explains why the binomial mean is $np$ (sum of $n$ means of $p$ ) and the variance is $np(1-p)$ (sum of $n$ variances, using independence).

Poisson as limit of binomial

If $X_n \sim \text{Bin}(n, \lambda/n)$ , then as $n \to \infty$ :

$\lim_{n \to \infty} P(X_n = k) = \frac{e^{-\lambda} \lambda^k}{k!}$

This is the Poisson limit theorem. The proof involves substituting $p = \lambda/n$ into the binomial PMF and taking the limit term by term. The key insight is that many independent trials, each with a tiny success probability, produce a count that's well-described by the Poisson.

Fitting discrete distributions

Method of moments

The method of moments estimates parameters by setting sample moments equal to theoretical moments and solving.

Steps:

Compute the sample moments from your data (sample mean $\bar{x}$ , sample variance $s^2$ , etc.)
Write out the theoretical moments as functions of the unknown parameter(s)
Set sample moments equal to theoretical moments
Solve for the parameter(s)

Bernoulli/Binomial example: For a Bernoulli distribution, $E(X) = p$ , so set $\bar{x} = p$ , giving $\hat{p} = \bar{x}$ . For a Poisson, $E(X) = \lambda$ , so $\hat{\lambda} = \bar{x}$ .

The method of moments is simple and intuitive, though it doesn't always produce the most efficient estimators.

Maximum likelihood estimation

MLE finds the parameter value that makes the observed data most probable.

Steps:

Write the likelihood function as the joint PMF of the observed data, treated as a function of $\theta$ : $L(\theta; x_1, \ldots, x_n) = \prod_{i=1}^{n} P(X = x_i; \theta)$
Take the natural log to get the log-likelihood $\ell(\theta) = \sum_{i=1}^{n} \ln P(X = x_i; \theta)$
Differentiate $\ell(\theta)$ with respect to $\theta$ and set equal to zero
Solve for $\hat{\theta}$
Verify it's a maximum (second derivative test)

MLEs have strong asymptotic properties: they are consistent (converge to the true value), asymptotically normal, and asymptotically efficient (achieve the lowest possible variance among estimators). For the Poisson, the MLE of $\lambda$ turns out to be $\hat{\lambda} = \bar{x}$ , which happens to coincide with the method of moments estimator.

Discrete distribution examples

Modeling claim frequency

The Poisson distribution is the standard model for claim frequency in actuarial science. If an insurer observes an average of $\lambda = 4.2$ claims per week from historical data, you can use $\text{Poisson}(4.2)$ to calculate the probability of any specific claim count in a future week.

The parameter $\lambda$ is typically estimated from historical data using either method of moments ( $\hat{\lambda} = \bar{x}$ ) or MLE. Once estimated, the model supports tasks like setting reserves, pricing premiums, and stress-testing under high-claim scenarios.

Modeling rare events

The Poisson distribution is particularly well-suited for rare events because it naturally arises from many exposures each with a small probability. Examples include:

Earthquakes in a region per year ( $\lambda$ might be 0.3)
Industrial accidents at a factory per quarter
Catastrophic insurance losses per decade

The small $\lambda$ value concentrates most of the probability mass near zero, which matches the observed behavior of rare events.

Modeling success/failure experiments

The binomial distribution fits scenarios with a fixed number of independent trials, each having the same probability of success. For instance, if a batch of 50 items each has a 4% defect rate, the number of defectives follows $\text{Bin}(50, 0.04)$ .

From this you can calculate quantities like:

$P(X = 0)$ : probability the entire batch is defect-free
$P(X \geq 5)$ : probability of 5 or more defectives (useful for quality control thresholds)
$E(X) = 50 \times 0.04 = 2$ : the expected number of defectives