Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.5 Discrete distributions (Bernoulli, binomial, Poisson)

1.5 Discrete distributions (Bernoulli, binomial, Poisson)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Discrete distributions like Bernoulli, binomial, and Poisson are foundational tools in actuarial math. They model events with countable outcomes, such as whether an insurance claim occurs, how many claims arise from a portfolio, or how often rare events happen in a given time window. Mastering these three distributions and their interrelationships is essential for probability exams and for real actuarial work in risk modeling.

Bernoulli distribution

The Bernoulli distribution models a single trial with exactly two outcomes: success (x=1x = 1) or failure (x=0x = 0). Think of it as the simplest possible random experiment. In actuarial contexts, this could be whether a single policyholder files a claim or not.

It also serves as the building block for the binomial distribution, since a binomial random variable is just a sum of independent Bernoulli trials.

Probability mass function

The PMF of a Bernoulli random variable XX is:

P(X=x)=px(1p)1xfor x{0,1}P(X = x) = p^x(1-p)^{1-x} \quad \text{for } x \in \{0, 1\}

  • pp is the probability of success
  • 1p1 - p (often written qq) is the probability of failure

When x=1x = 1, this simplifies to P(X=1)=pP(X=1) = p. When x=0x = 0, it gives P(X=0)=1pP(X=0) = 1-p. The compact formula just combines both cases into one expression.

Mean and variance

  • Mean: E(X)=pE(X) = p
  • Variance: Var(X)=p(1p)Var(X) = p(1-p)
  • Standard deviation: σ=p(1p)\sigma = \sqrt{p(1-p)}

Notice the variance is maximized when p=0.5p = 0.5 and equals zero when p=0p = 0 or p=1p = 1. This makes intuitive sense: there's no uncertainty if the outcome is guaranteed.

Applications of Bernoulli distribution

  • Whether a single policyholder files a claim (claim/no claim)
  • Whether a manufactured item passes inspection (defective/non-defective)
  • Whether a single medical treatment succeeds or fails
  • Any binary outcome that feeds into a larger binomial model

Binomial distribution

The binomial distribution counts the number of successes in nn independent Bernoulli trials, each with the same success probability pp. For example, if you have 100 policyholders each with a 3% claim probability, the total number of claims follows a binomial distribution with n=100n = 100 and p=0.03p = 0.03.

Two conditions must hold for the binomial to apply: the trials must be independent, and pp must be constant across all trials.

Probability mass function

The PMF of a binomial random variable XX is:

P(X=k)=(nk)pk(1p)nkfor k=0,1,2,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \quad \text{for } k = 0, 1, 2, \ldots, n

  • nn is the number of trials
  • pp is the probability of success on each trial
  • (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient, counting the number of ways to arrange kk successes among nn trials

The logic: pk(1p)nkp^k(1-p)^{n-k} is the probability of one specific sequence with kk successes, and (nk)\binom{n}{k} accounts for all possible orderings of those successes.

Cumulative distribution function

The CDF of a binomial random variable XX is:

F(x)=P(Xx)=k=0x(nk)pk(1p)nkF(x) = P(X \leq x) = \sum_{k=0}^{\lfloor x \rfloor} \binom{n}{k} p^k (1-p)^{n-k}

This gives the probability of observing xx or fewer successes. To find the probability over a range, use:

P(aXb)=F(b)F(a1)P(a \leq X \leq b) = F(b) - F(a-1)

Note the a1a - 1, not aa. Since XX is discrete, you need to include the point aa itself.

Mean and variance

  • Mean: E(X)=npE(X) = np
  • Variance: Var(X)=np(1p)Var(X) = np(1-p)
  • Standard deviation: σ=np(1p)\sigma = \sqrt{np(1-p)}

These follow directly from the fact that XX is a sum of nn independent Bernoulli variables, each with mean pp and variance p(1p)p(1-p).

Moment generating function

The MGF of a binomial random variable XX is:

MX(t)=(pet+1p)nM_X(t) = (pe^t + 1 - p)^n

The MGF uniquely determines the distribution. You can extract the kk-th moment by computing the kk-th derivative of MX(t)M_X(t) and evaluating at t=0t = 0. The MGF is also useful for proving that sums of independent binomials (with the same pp) remain binomial.

Properties of binomial distribution

  • Additivity: If X1Bin(n1,p)X_1 \sim \text{Bin}(n_1, p) and X2Bin(n2,p)X_2 \sim \text{Bin}(n_2, p) are independent, then X1+X2Bin(n1+n2,p)X_1 + X_2 \sim \text{Bin}(n_1 + n_2, p). The success probabilities must be equal for this to work.
  • Normal approximation: As nn grows, the binomial approaches a normal distribution. The standard rule of thumb is that the approximation is reasonable when both np5np \geq 5 and n(1p)5n(1-p) \geq 5. A continuity correction (adjusting by ±0.5\pm 0.5) improves accuracy.
Probability mass function, File:Discrete probability distribution illustration.png - Wikimedia Commons

Binomial approximation to hypergeometric

When sampling without replacement from a finite population, the exact distribution is hypergeometric. But if the sample size nn is small relative to the population size NN (the common guideline is n<0.05Nn < 0.05N), the binomial with p=K/Np = K/N provides a good approximation, where KK is the number of "successes" in the population.

The reasoning: when the population is large enough, removing one item barely changes the composition, so sampling without replacement behaves almost like sampling with replacement.

Applications of binomial distribution

  • Number of claims filed out of nn policies in a portfolio
  • Number of defective items in a batch of nn products
  • Number of successful treatments out of nn patients
  • Number of wins in a fixed-length series of games

Poisson distribution

The Poisson distribution models the count of events occurring in a fixed interval of time or space, given a known average rate. Unlike the binomial, there's no fixed number of trials; the support is all non-negative integers {0,1,2,}\{0, 1, 2, \ldots\}. It's characterized by a single parameter λ>0\lambda > 0, the average number of events per interval.

In actuarial work, the Poisson is the go-to distribution for claim frequency modeling, especially when individual claim probabilities are small but the exposure is large.

Probability mass function

The PMF of a Poisson random variable XX is:

P(X=k)=eλλkk!for k=0,1,2,P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} \quad \text{for } k = 0, 1, 2, \ldots

  • λ\lambda is the average number of events per interval
  • e2.71828e \approx 2.71828 is Euler's number

For example, if an insurer expects λ=3\lambda = 3 claims per month, the probability of exactly 5 claims is:

P(X=5)=e3355!=0.049792431200.1008P(X = 5) = \frac{e^{-3} \cdot 3^5}{5!} = \frac{0.04979 \cdot 243}{120} \approx 0.1008

Cumulative distribution function

The CDF of a Poisson random variable XX is:

F(x)=P(Xx)=k=0xeλλkk!F(x) = P(X \leq x) = \sum_{k=0}^{\lfloor x \rfloor} \frac{e^{-\lambda} \lambda^k}{k!}

As with the binomial, for a range: P(aXb)=F(b)F(a1)P(a \leq X \leq b) = F(b) - F(a-1).

Mean and variance

  • Mean: E(X)=λE(X) = \lambda
  • Variance: Var(X)=λVar(X) = \lambda
  • Standard deviation: σ=λ\sigma = \sqrt{\lambda}

The fact that the mean equals the variance is a defining characteristic of the Poisson distribution. In practice, if you observe data where the sample variance is much larger or smaller than the sample mean, a Poisson model may not be appropriate. This is called overdispersion (variance > mean) or underdispersion (variance < mean).

Moment generating function

The MGF of a Poisson random variable XX is:

MX(t)=eλ(et1)M_X(t) = e^{\lambda(e^t - 1)}

As with the binomial, the kk-th moment is found by taking the kk-th derivative of MX(t)M_X(t) at t=0t = 0. The MGF also makes it straightforward to prove the additivity property below.

Properties of Poisson distribution

  • Additivity: If X1Poisson(λ1)X_1 \sim \text{Poisson}(\lambda_1) and X2Poisson(λ2)X_2 \sim \text{Poisson}(\lambda_2) are independent, then X1+X2Poisson(λ1+λ2)X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2). You can verify this by multiplying the MGFs.
  • Limiting case of binomial: The Poisson distribution arises as the limit of Bin(n,p)\text{Bin}(n, p) when nn \to \infty, p0p \to 0, and np=λnp = \lambda stays constant. This is why the Poisson works well for modeling many independent rare events.

Poisson approximation to binomial

When nn is large and pp is small, computing binomial probabilities directly can be cumbersome. The Poisson with λ=np\lambda = np provides a convenient approximation.

Common rules of thumb for when the approximation is adequate:

  • n20n \geq 20 and p0.05p \leq 0.05, or
  • n100n \geq 100 and np10np \leq 10

Example: Suppose 1,000 policies each have a 0.2% chance of a catastrophic claim. The exact distribution is Bin(1000,0.002)\text{Bin}(1000, 0.002), but you can approximate it with Poisson(2)\text{Poisson}(2) since λ=1000×0.002=2\lambda = 1000 \times 0.002 = 2.

Probability mass function, Probability mass functions

Applications of Poisson distribution

  • Number of insurance claims arriving per month or per year
  • Number of accidents at an intersection over a given period
  • Number of natural disasters in a region per decade
  • Number of customer arrivals at a service counter per hour

Relationships between distributions

Binomial as sum of Bernoullis

If X1,X2,,XnX_1, X_2, \ldots, X_n are independent Bernoulli random variables each with success probability pp, then:

Y=i=1nXiBin(n,p)Y = \sum_{i=1}^{n} X_i \sim \text{Bin}(n, p)

This is the formal connection: the binomial distribution is a generalization of the Bernoulli to multiple trials. It also explains why the binomial mean is npnp (sum of nn means of pp) and the variance is np(1p)np(1-p) (sum of nn variances, using independence).

Poisson as limit of binomial

If XnBin(n,λ/n)X_n \sim \text{Bin}(n, \lambda/n), then as nn \to \infty:

limnP(Xn=k)=eλλkk!\lim_{n \to \infty} P(X_n = k) = \frac{e^{-\lambda} \lambda^k}{k!}

This is the Poisson limit theorem. The proof involves substituting p=λ/np = \lambda/n into the binomial PMF and taking the limit term by term. The key insight is that many independent trials, each with a tiny success probability, produce a count that's well-described by the Poisson.

Fitting discrete distributions

Method of moments

The method of moments estimates parameters by setting sample moments equal to theoretical moments and solving.

Steps:

  1. Compute the sample moments from your data (sample mean xˉ\bar{x}, sample variance s2s^2, etc.)
  2. Write out the theoretical moments as functions of the unknown parameter(s)
  3. Set sample moments equal to theoretical moments
  4. Solve for the parameter(s)

Bernoulli/Binomial example: For a Bernoulli distribution, E(X)=pE(X) = p, so set xˉ=p\bar{x} = p, giving p^=xˉ\hat{p} = \bar{x}. For a Poisson, E(X)=λE(X) = \lambda, so λ^=xˉ\hat{\lambda} = \bar{x}.

The method of moments is simple and intuitive, though it doesn't always produce the most efficient estimators.

Maximum likelihood estimation

MLE finds the parameter value that makes the observed data most probable.

Steps:

  1. Write the likelihood function as the joint PMF of the observed data, treated as a function of θ\theta: L(θ;x1,,xn)=i=1nP(X=xi;θ)L(\theta; x_1, \ldots, x_n) = \prod_{i=1}^{n} P(X = x_i; \theta)
  2. Take the natural log to get the log-likelihood (θ)=i=1nlnP(X=xi;θ)\ell(\theta) = \sum_{i=1}^{n} \ln P(X = x_i; \theta)
  3. Differentiate (θ)\ell(\theta) with respect to θ\theta and set equal to zero
  4. Solve for θ^\hat{\theta}
  5. Verify it's a maximum (second derivative test)

MLEs have strong asymptotic properties: they are consistent (converge to the true value), asymptotically normal, and asymptotically efficient (achieve the lowest possible variance among estimators). For the Poisson, the MLE of λ\lambda turns out to be λ^=xˉ\hat{\lambda} = \bar{x}, which happens to coincide with the method of moments estimator.

Discrete distribution examples

Modeling claim frequency

The Poisson distribution is the standard model for claim frequency in actuarial science. If an insurer observes an average of λ=4.2\lambda = 4.2 claims per week from historical data, you can use Poisson(4.2)\text{Poisson}(4.2) to calculate the probability of any specific claim count in a future week.

The parameter λ\lambda is typically estimated from historical data using either method of moments (λ^=xˉ\hat{\lambda} = \bar{x}) or MLE. Once estimated, the model supports tasks like setting reserves, pricing premiums, and stress-testing under high-claim scenarios.

Modeling rare events

The Poisson distribution is particularly well-suited for rare events because it naturally arises from many exposures each with a small probability. Examples include:

  • Earthquakes in a region per year (λ\lambda might be 0.3)
  • Industrial accidents at a factory per quarter
  • Catastrophic insurance losses per decade

The small λ\lambda value concentrates most of the probability mass near zero, which matches the observed behavior of rare events.

Modeling success/failure experiments

The binomial distribution fits scenarios with a fixed number of independent trials, each having the same probability of success. For instance, if a batch of 50 items each has a 4% defect rate, the number of defectives follows Bin(50,0.04)\text{Bin}(50, 0.04).

From this you can calculate quantities like:

  • P(X=0)P(X = 0): probability the entire batch is defect-free
  • P(X5)P(X \geq 5): probability of 5 or more defectives (useful for quality control thresholds)
  • E(X)=50×0.04=2E(X) = 50 \times 0.04 = 2: the expected number of defectives