Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.6 Continuous distributions (normal, exponential, gamma)

1.6 Continuous distributions (normal, exponential, gamma)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Properties of Continuous Distributions

Continuous distributions model random variables that can take any value within a range. They're fundamental in actuarial work because most real-world quantities you'll model (claim amounts, time until death, investment returns) aren't restricted to whole numbers.

One crucial distinction from discrete distributions: the probability of a continuous random variable taking on any exact value is zero. You can only calculate probabilities over intervals. Three tools define and characterize these distributions: the probability density function (PDF), the cumulative distribution function (CDF), and moments.

Probability Density Functions

The PDF, denoted f(x)f(x), describes the relative likelihood of a continuous random variable near a specific value. It must satisfy two conditions:

  • Non-negativity: f(x)0f(x) \geq 0 for all xx
  • Total area equals 1: f(x)dx=1\int_{-\infty}^{\infty} f(x) \, dx = 1

To find the probability that XX falls within a range [a,b][a, b], you integrate the PDF:

P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx

Note that f(x)f(x) itself is not a probability. It can even exceed 1 at certain points. Only the area under the curve over an interval gives you a probability.

Cumulative Distribution Functions

The CDF, denoted F(x)F(x), gives the probability that XX is less than or equal to xx:

F(x)=P(Xx)=xf(t)dtF(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) \, dt

Key properties of the CDF:

  • Non-decreasing: if x1<x2x_1 < x_2, then F(x1)F(x2)F(x_1) \leq F(x_2)
  • Boundary behavior: limxF(x)=0\lim_{x \to -\infty} F(x) = 0 and limxF(x)=1\lim_{x \to \infty} F(x) = 1
  • Relationship to PDF: f(x)=ddxF(x)f(x) = \frac{d}{dx} F(x) wherever the derivative exists

The CDF is often more practical than the PDF for computing probabilities directly, since P(aXb)=F(b)F(a)P(a \leq X \leq b) = F(b) - F(a).

Moments of Continuous Distributions

Moments characterize a distribution's central tendency, spread, and shape. The nn-th moment of XX about the origin is:

E[Xn]=xnf(x)dxE[X^n] = \int_{-\infty}^{\infty} x^n f(x) \, dx

The two most important moments:

  • Mean (first moment): μ=E[X]=xf(x)dx\mu = E[X] = \int_{-\infty}^{\infty} x \, f(x) \, dx
  • Variance (second central moment): σ2=E[(Xμ)2]=(xμ)2f(x)dx\sigma^2 = E[(X - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx

A useful computational shortcut for variance: σ2=E[X2](E[X])2\sigma^2 = E[X^2] - (E[X])^2. This avoids expanding the squared term in the integral.

Normal Distribution

The normal (Gaussian) distribution is symmetric and bell-shaped, and it shows up constantly in actuarial work. Its importance comes largely from the central limit theorem: sums and averages of many independent random variables tend toward a normal distribution regardless of the underlying distribution.

Probability Density Function of Normal Distribution

For a normal distribution with mean μ\mu and standard deviation σ\sigma:

f(x)=1σ2πe(xμ)22σ2,<x<f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty

  • The curve is symmetric around μ\mu, which is also the median and mode.
  • Smaller σ\sigma produces a taller, narrower bell; larger σ\sigma produces a flatter, wider one.
  • The variance is σ2\sigma^2, and the mean is μ\mu.

The empirical rule is worth memorizing: roughly 68% of values fall within ±1σ\pm 1\sigma of the mean, 95% within ±2σ\pm 2\sigma, and 99.7% within ±3σ\pm 3\sigma.

Standard Normal Distribution

The standard normal distribution has μ=0\mu = 0 and σ=1\sigma = 1. Its PDF and CDF have special notation:

ϕ(z)=12πez22\phi(z) = \frac{1}{\sqrt{2\pi}} \, e^{-\frac{z^2}{2}}

Φ(z)=P(Zz)\Phi(z) = P(Z \leq z)

Any normal random variable XN(μ,σ2)X \sim N(\mu, \sigma^2) can be standardized by computing:

Z=XμσZ = \frac{X - \mu}{\sigma}

This lets you convert any normal probability into a standard normal lookup:

P(Xx)=Φ ⁣(xμσ)P(X \leq x) = \Phi\!\left(\frac{x - \mu}{\sigma}\right)

Applications of Normal Distribution

  • Modeling aggregate claims when the portfolio is large enough for the central limit theorem to apply
  • Approximating the distribution of sample means and sums in risk analysis
  • Serving as the basis for many statistical tests and confidence intervals used in actuarial practice

Exponential Distribution

The exponential distribution models the time between events in a Poisson process. If events occur at a constant average rate, the waiting time between consecutive events follows an exponential distribution.

Probability Density Function of Exponential Distribution

For rate parameter λ>0\lambda > 0:

f(x)=λeλx,x0f(x) = \lambda e^{-\lambda x}, \quad x \geq 0

F(x)=1eλx,x0F(x) = 1 - e^{-\lambda x}, \quad x \geq 0

Both the mean and standard deviation equal 1λ\frac{1}{\lambda}, and the variance is 1λ2\frac{1}{\lambda^2}. So if claims arrive at a rate of λ=5\lambda = 5 per hour, the average waiting time between claims is 15=0.2\frac{1}{5} = 0.2 hours (12 minutes).

Memoryless Property

The exponential distribution is the only continuous distribution with the memoryless property:

P(X>s+tX>s)=P(X>t)for all s,t0P(X > s + t \mid X > s) = P(X > t) \quad \text{for all } s, t \geq 0

In plain terms: if you've already waited ss units of time without an event, the probability of waiting an additional tt units is the same as if you'd just started waiting. The past gives you no information about the future.

This property makes the exponential distribution appropriate when the failure rate (or hazard rate) is constant over time. It's a strong assumption, so always check whether it's reasonable for your data.

Probability density functions, Statistics/Distributions/Gamma - Wikibooks, open books for an open world

Applications of Exponential Distribution

  • Modeling inter-arrival times of insurance claims under a Poisson process
  • Estimating the lifetime of components with a constant failure rate
  • Serving as a building block for more complex models (e.g., the gamma distribution)

Gamma Distribution

The gamma distribution generalizes the exponential distribution by adding a shape parameter. Where the exponential models the time until one event, the gamma can model the time until the α\alpha-th event.

Probability Density Function of Gamma Distribution

For shape parameter α>0\alpha > 0 and rate parameter β>0\beta > 0:

f(x)=βαΓ(α)xα1eβx,x0f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} \, x^{\alpha-1} e^{-\beta x}, \quad x \geq 0

Here Γ(α)\Gamma(\alpha) is the gamma function, which generalizes the factorial: Γ(n)=(n1)!\Gamma(n) = (n-1)! for positive integers.

  • Mean: αβ\frac{\alpha}{\beta}
  • Variance: αβ2\frac{\alpha}{\beta^2}

The CDF involves the lower incomplete gamma function and generally doesn't have a closed form:

F(x)=γ(α,βx)Γ(α)F(x) = \frac{\gamma(\alpha, \beta x)}{\Gamma(\alpha)}

For integer values of α\alpha, you can express the CDF as a finite sum involving exponential and polynomial terms.

Special Cases of Gamma Distribution

The gamma family includes several important distributions:

  • Exponential: Set α=1\alpha = 1, and you get the exponential distribution with rate β\beta.
  • Chi-square: Set α=n/2\alpha = n/2 and β=1/2\beta = 1/2, and you get the chi-square distribution with nn degrees of freedom.
  • Erlang: Restrict α\alpha to positive integers. The Erlang distribution models the sum of α\alpha independent exponential random variables.

Recognizing these connections saves time on exams and in practice. If you see a sum of independent exponentials, think gamma.

Applications of Gamma Distribution

  • Modeling the waiting time until the α\alpha-th event in a Poisson process
  • Fitting claim severity distributions when the exponential is too restrictive
  • Modeling aggregate quantities like total rainfall over a fixed period

Relationship Between Distributions

Understanding how these distributions connect helps you choose the right model and simplify calculations.

Normal Distribution as Limiting Case

The central limit theorem (CLT) is the main link. It states that the sum of a large number of independent, identically distributed random variables is approximately normal, regardless of the original distribution.

Specific convergence results:

  • Binomial \to Normal: As nn \to \infty with fixed pp, the standardized binomial approaches N(0,1)N(0,1). A common rule of thumb is np5np \geq 5 and n(1p)5n(1-p) \geq 5.
  • Poisson \to Normal: For large λ\lambda, Poisson(λ)\text{Poisson}(\lambda) is well-approximated by N(λ,λ)N(\lambda, \lambda).
  • Gamma \to Normal: As the shape parameter α\alpha \to \infty, the standardized gamma distribution approaches normality.

Exponential Distribution vs. Gamma Distribution

  • The exponential is a gamma with α=1\alpha = 1.
  • The sum of nn independent Exp(λ)\text{Exp}(\lambda) random variables follows Gamma(n,λ)\text{Gamma}(n, \lambda).
  • The gamma offers more flexibility because its shape parameter controls skewness. When α\alpha is small, the distribution is heavily right-skewed (like the exponential). As α\alpha increases, it becomes more symmetric and bell-shaped.

Transformations of Continuous Distributions

Transforming random variables creates new distributions. This technique is essential for deriving the distribution of quantities like Y=g(X)Y = g(X) when you know the distribution of XX.

Linear Transformations

If Y=aX+bY = aX + b where a0a \neq 0:

fY(y)=1afX ⁣(yba)f_Y(y) = \frac{1}{|a|} \, f_X\!\left(\frac{y - b}{a}\right)

Linear transformations shift and scale the distribution but preserve its shape. For example, standardizing a normal variable (Z=XμσZ = \frac{X - \mu}{\sigma}) is a linear transformation that converts N(μ,σ2)N(\mu, \sigma^2) to N(0,1)N(0,1).

Probability density functions, Introduction to Continuous Random Variables | Introduction to Statistics

Non-Linear Transformations

Non-linear transformations change the shape of the distribution. Common examples:

  • Exponential transformation: Y=eXY = e^X. If XN(μ,σ2)X \sim N(\mu, \sigma^2), then YY follows a lognormal distribution, which is widely used for claim severity modeling.
  • Logarithmic transformation: Y=ln(X)Y = \ln(X). Often used to reduce right-skewness in data.
  • Power transformation: Y=XpY = X^p.

The general change-of-variables formula for a monotone transformation Y=g(X)Y = g(X):

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|

To apply this formula:

  1. Find the inverse function g1(y)g^{-1}(y).
  2. Compute its derivative with respect to yy.
  3. Take the absolute value of that derivative.
  4. Multiply by fXf_X evaluated at g1(y)g^{-1}(y).

Estimation and Inference

Estimating distribution parameters from data is a core actuarial task. Two standard approaches are maximum likelihood estimation (MLE) and method of moments estimation (MME).

Maximum Likelihood Estimation

MLE finds the parameter values that make the observed data most probable. Here's the process:

  1. Write the likelihood function as the product of the PDFs evaluated at each data point: L(θ)=i=1nf(xi;θ)L(\theta) = \prod_{i=1}^{n} f(x_i; \theta)
  2. Take the natural log to get the log-likelihood: (θ)=i=1nlnf(xi;θ)\ell(\theta) = \sum_{i=1}^{n} \ln f(x_i; \theta)
  3. Differentiate with respect to each parameter and set equal to zero.
  4. Solve for the parameter estimates.

MLE has strong theoretical properties: it's consistent (converges to the true value as nn \to \infty) and asymptotically efficient (achieves the lowest possible variance among consistent estimators, under regularity conditions).

Method of Moments Estimation

MME is more straightforward. You set sample moments equal to theoretical moments and solve:

  1. Compute the kk-th sample moment: mk=1ni=1nXikm_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k
  2. Write the theoretical moments as functions of the unknown parameters.
  3. Set mkm_k equal to the corresponding theoretical moment for as many moments as you have parameters.
  4. Solve the resulting system of equations.

For example, to estimate the parameters of a gamma distribution (α\alpha and β\beta), you'd set the sample mean equal to α/β\alpha/\beta and the sample variance equal to α/β2\alpha/\beta^2, then solve for both parameters.

MME is consistent but generally less efficient than MLE. Its advantage is computational simplicity, especially when the likelihood equations are hard to solve.

Confidence Intervals for Parameters

A (1α)×100%(1-\alpha) \times 100\% confidence interval [L,U][L, U] provides a range of plausible values for a parameter θ\theta, constructed so that P(LθU)=1αP(L \leq \theta \leq U) = 1 - \alpha.

Common construction methods:

  • Wald interval: Uses the asymptotic normality of the MLE. The interval is θ^±zα/2SE(θ^)\hat{\theta} \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta}), where SE\text{SE} is the standard error.
  • Likelihood ratio interval: Inverts the likelihood ratio test.
  • Bootstrap: Resamples the data to approximate the sampling distribution of the estimator.

Confidence intervals narrow as sample size increases, reflecting greater precision.

Goodness-of-Fit Tests

After fitting a distribution, you need to check whether it actually fits the data well. Goodness-of-fit tests formalize this comparison.

Chi-Square Test

The chi-square test bins the data and compares observed counts to expected counts under the hypothesized distribution.

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

where OiO_i and EiE_i are the observed and expected frequencies in bin ii.

Under the null hypothesis, this statistic follows a chi-square distribution with k1mk - 1 - m degrees of freedom, where mm is the number of parameters estimated from the data.

Practical considerations:

  • Expected frequencies should generally be at least 5 per bin for the approximation to hold.
  • Results can be sensitive to how you choose the bins. Using equal-probability bins (each bin has the same expected count) tends to work better than equal-width bins.

Kolmogorov-Smirnov Test

The KS test compares the empirical distribution function (EDF) directly to the theoretical CDF, with no binning required:

Dn=supxFn(x)F(x)D_n = \sup_x |F_n(x) - F(x)|

where Fn(x)F_n(x) is the EDF (the proportion of data points x\leq x) and F(x)F(x) is the hypothesized CDF.

The KS test is distribution-free under the null hypothesis when parameters are fully specified (not estimated from data). If you estimate parameters from the same data, you need modified critical values (the Lilliefors correction for normality testing, for instance).

The KS test tends to be more powerful than the chi-square test for small samples because it doesn't lose information through binning.

Applications in Actuarial Science

Modeling Claim Severity

Claim severity (the dollar amount of individual claims) is typically right-skewed: most claims are small, but a few are very large. Distributions commonly used include:

  • Gamma: Flexible shape, works well for moderate-tailed data
  • Lognormal: Y=eXY = e^X where XX is normal; captures heavy right skew
  • Pareto: Heavy-tailed, often used for large or catastrophic claims

The choice depends on the line of business and the tail behavior observed in historical data. Accurate severity modeling directly affects premium calculations and reserve estimates.

Modeling Lifetime Distributions

In life insurance and annuity pricing, you model the time until death or failure:

  • Exponential: Constant hazard rate (unrealistic for human mortality, but useful as a baseline)
  • Gamma: Allows the hazard rate to change, though it can only model monotonically increasing hazard when α>1\alpha > 1
  • Weibull: Allows increasing, decreasing, or constant hazard rates depending on its shape parameter, making it popular in survival analysis

Estimating lifetime distribution parameters accurately is critical for pricing life insurance, calculating annuity values, and assessing insurer solvency.

Pricing Insurance Products

Pricing combines frequency and severity models to estimate expected losses. Continuous distributions enter at multiple stages:

  • Severity modeling: Fitting distributions to historical claim amounts
  • Aggregate loss modeling: Combining frequency and severity (often using normal approximations via the CLT for large portfolios)
  • Investment returns: Modeling the time value of money with continuous distributions

Actuaries refine these models using risk classification, credibility theory, and experience rating. Stochastic simulation (e.g., Monte Carlo methods) tests how sensitive premiums are to model assumptions and quantifies uncertainty in the estimates.