Continuous distributions model random variables that can take any value within a range. They're fundamental in actuarial work because most real-world quantities you'll model (claim amounts, time until death, investment returns) aren't restricted to whole numbers.

One crucial distinction from discrete distributions: the probability of a continuous random variable taking on any exact value is zero. You can only calculate probabilities over intervals. Three tools define and characterize these distributions: the probability density function (PDF), the cumulative distribution function (CDF), and moments.

Probability Density Functions

The PDF, denoted $f(x)$ , describes the relative likelihood of a continuous random variable near a specific value. It must satisfy two conditions:

Non-negativity: $f(x) \geq 0$ for all $x$
Total area equals 1: $\int_{-\infty}^{\infty} f(x) \, dx = 1$

To find the probability that $X$ falls within a range $[a, b]$ , you integrate the PDF:

$P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx$

Note that $f(x)$ itself is not a probability. It can even exceed 1 at certain points. Only the area under the curve over an interval gives you a probability.

Cumulative Distribution Functions

The CDF, denoted $F(x)$ , gives the probability that $X$ is less than or equal to $x$ :

$F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) \, dt$

Key properties of the CDF:

Non-decreasing: if $x_1 < x_2$ , then $F(x_1) \leq F(x_2)$
Boundary behavior: $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$
Relationship to PDF: $f(x) = \frac{d}{dx} F(x)$ wherever the derivative exists

The CDF is often more practical than the PDF for computing probabilities directly, since $P(a \leq X \leq b) = F(b) - F(a)$ .

Moments of Continuous Distributions

Moments characterize a distribution's central tendency, spread, and shape. The $n$ -th moment of $X$ about the origin is:

$E[X^n] = \int_{-\infty}^{\infty} x^n f(x) \, dx$

The two most important moments:

Mean (first moment): $\mu = E[X] = \int_{-\infty}^{\infty} x \, f(x) \, dx$
Variance (second central moment): $\sigma^2 = E[(X - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx$

A useful computational shortcut for variance: $\sigma^2 = E[X^2] - (E[X])^2$ . This avoids expanding the squared term in the integral.

Normal Distribution

The normal (Gaussian) distribution is symmetric and bell-shaped, and it shows up constantly in actuarial work. Its importance comes largely from the central limit theorem: sums and averages of many independent random variables tend toward a normal distribution regardless of the underlying distribution.

Probability Density Function of Normal Distribution

For a normal distribution with mean $\mu$ and standard deviation $\sigma$ :

$f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty$

The curve is symmetric around $\mu$ , which is also the median and mode.
Smaller $\sigma$ produces a taller, narrower bell; larger $\sigma$ produces a flatter, wider one.
The variance is $\sigma^2$ , and the mean is $\mu$ .

The empirical rule is worth memorizing: roughly 68% of values fall within $\pm 1\sigma$ of the mean, 95% within $\pm 2\sigma$ , and 99.7% within $\pm 3\sigma$ .

Standard Normal Distribution

The standard normal distribution has $\mu = 0$ and $\sigma = 1$ . Its PDF and CDF have special notation:

$\phi(z) = \frac{1}{\sqrt{2\pi}} \, e^{-\frac{z^2}{2}}$

$\Phi(z) = P(Z \leq z)$

Any normal random variable $X \sim N(\mu, \sigma^2)$ can be standardized by computing:

$Z = \frac{X - \mu}{\sigma}$

This lets you convert any normal probability into a standard normal lookup:

$P(X \leq x) = \Phi\!\left(\frac{x - \mu}{\sigma}\right)$

Applications of Normal Distribution

Modeling aggregate claims when the portfolio is large enough for the central limit theorem to apply
Approximating the distribution of sample means and sums in risk analysis
Serving as the basis for many statistical tests and confidence intervals used in actuarial practice

Exponential Distribution

The exponential distribution models the time between events in a Poisson process. If events occur at a constant average rate, the waiting time between consecutive events follows an exponential distribution.

Probability Density Function of Exponential Distribution

For rate parameter $\lambda > 0$ :

$f(x) = \lambda e^{-\lambda x}, \quad x \geq 0$

$F(x) = 1 - e^{-\lambda x}, \quad x \geq 0$

Both the mean and standard deviation equal $\frac{1}{\lambda}$ , and the variance is $\frac{1}{\lambda^2}$ . So if claims arrive at a rate of $\lambda = 5$ per hour, the average waiting time between claims is $\frac{1}{5} = 0.2$ hours (12 minutes).

Memoryless Property

The exponential distribution is the only continuous distribution with the memoryless property:

$P(X > s + t \mid X > s) = P(X > t) \quad \text{for all } s, t \geq 0$

In plain terms: if you've already waited $s$ units of time without an event, the probability of waiting an additional $t$ units is the same as if you'd just started waiting. The past gives you no information about the future.

This property makes the exponential distribution appropriate when the failure rate (or hazard rate) is constant over time. It's a strong assumption, so always check whether it's reasonable for your data.

Probability density functions, Statistics/Distributions/Gamma - Wikibooks, open books for an open world

Applications of Exponential Distribution

Modeling inter-arrival times of insurance claims under a Poisson process
Estimating the lifetime of components with a constant failure rate
Serving as a building block for more complex models (e.g., the gamma distribution)

Gamma Distribution

The gamma distribution generalizes the exponential distribution by adding a shape parameter. Where the exponential models the time until one event, the gamma can model the time until the $\alpha$ -th event.

Probability Density Function of Gamma Distribution

For shape parameter $\alpha > 0$ and rate parameter $\beta > 0$ :

$f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} \, x^{\alpha-1} e^{-\beta x}, \quad x \geq 0$

Here $\Gamma(\alpha)$ is the gamma function, which generalizes the factorial: $\Gamma(n) = (n-1)!$ for positive integers.

Mean: $\frac{\alpha}{\beta}$
Variance: $\frac{\alpha}{\beta^2}$

The CDF involves the lower incomplete gamma function and generally doesn't have a closed form:

$F(x) = \frac{\gamma(\alpha, \beta x)}{\Gamma(\alpha)}$

For integer values of $\alpha$ , you can express the CDF as a finite sum involving exponential and polynomial terms.

Special Cases of Gamma Distribution

The gamma family includes several important distributions:

Exponential: Set $\alpha = 1$ , and you get the exponential distribution with rate $\beta$ .
Chi-square: Set $\alpha = n/2$ and $\beta = 1/2$ , and you get the chi-square distribution with $n$ degrees of freedom.
Erlang: Restrict $\alpha$ to positive integers. The Erlang distribution models the sum of $\alpha$ independent exponential random variables.

Recognizing these connections saves time on exams and in practice. If you see a sum of independent exponentials, think gamma.

Applications of Gamma Distribution

Modeling the waiting time until the $\alpha$ -th event in a Poisson process
Fitting claim severity distributions when the exponential is too restrictive
Modeling aggregate quantities like total rainfall over a fixed period

Relationship Between Distributions

Understanding how these distributions connect helps you choose the right model and simplify calculations.

Normal Distribution as Limiting Case

The central limit theorem (CLT) is the main link. It states that the sum of a large number of independent, identically distributed random variables is approximately normal, regardless of the original distribution.

Specific convergence results:

Binomial $\to$ Normal: As $n \to \infty$ with fixed $p$ , the standardized binomial approaches $N(0,1)$ . A common rule of thumb is $np \geq 5$ and $n(1-p) \geq 5$ .
Poisson $\to$ Normal: For large $\lambda$ , $\text{Poisson}(\lambda)$ is well-approximated by $N(\lambda, \lambda)$ .
Gamma $\to$ Normal: As the shape parameter $\alpha \to \infty$ , the standardized gamma distribution approaches normality.

Exponential Distribution vs. Gamma Distribution

The exponential is a gamma with $\alpha = 1$ .
The sum of $n$ independent $\text{Exp}(\lambda)$ random variables follows $\text{Gamma}(n, \lambda)$ .
The gamma offers more flexibility because its shape parameter controls skewness. When $\alpha$ is small, the distribution is heavily right-skewed (like the exponential). As $\alpha$ increases, it becomes more symmetric and bell-shaped.

Transformations of Continuous Distributions

Transforming random variables creates new distributions. This technique is essential for deriving the distribution of quantities like $Y = g(X)$ when you know the distribution of $X$ .

Linear Transformations

If $Y = aX + b$ where $a \neq 0$ :

$f_Y(y) = \frac{1}{|a|} \, f_X\!\left(\frac{y - b}{a}\right)$

Linear transformations shift and scale the distribution but preserve its shape. For example, standardizing a normal variable ( $Z = \frac{X - \mu}{\sigma}$ ) is a linear transformation that converts $N(\mu, \sigma^2)$ to $N(0,1)$ .

Probability density functions, Introduction to Continuous Random Variables | Introduction to Statistics

Non-Linear Transformations

Non-linear transformations change the shape of the distribution. Common examples:

Exponential transformation: $Y = e^X$ . If $X \sim N(\mu, \sigma^2)$ , then $Y$ follows a lognormal distribution, which is widely used for claim severity modeling.
Logarithmic transformation: $Y = \ln(X)$ . Often used to reduce right-skewness in data.
Power transformation: $Y = X^p$ .

The general change-of-variables formula for a monotone transformation $Y = g(X)$ :

$f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|$

To apply this formula:

Find the inverse function $g^{-1}(y)$ .
Compute its derivative with respect to $y$ .
Take the absolute value of that derivative.
Multiply by $f_X$ evaluated at $g^{-1}(y)$ .

Estimation and Inference

Estimating distribution parameters from data is a core actuarial task. Two standard approaches are maximum likelihood estimation (MLE) and method of moments estimation (MME).

Maximum Likelihood Estimation

MLE finds the parameter values that make the observed data most probable. Here's the process:

Write the likelihood function as the product of the PDFs evaluated at each data point: $L(\theta) = \prod_{i=1}^{n} f(x_i; \theta)$
Take the natural log to get the log-likelihood: $\ell(\theta) = \sum_{i=1}^{n} \ln f(x_i; \theta)$
Differentiate with respect to each parameter and set equal to zero.
Solve for the parameter estimates.

MLE has strong theoretical properties: it's consistent (converges to the true value as $n \to \infty$ ) and asymptotically efficient (achieves the lowest possible variance among consistent estimators, under regularity conditions).

Method of Moments Estimation

MME is more straightforward. You set sample moments equal to theoretical moments and solve:

Compute the $k$ -th sample moment: $m_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k$
Write the theoretical moments as functions of the unknown parameters.
Set $m_k$ equal to the corresponding theoretical moment for as many moments as you have parameters.
Solve the resulting system of equations.

For example, to estimate the parameters of a gamma distribution ( $\alpha$ and $\beta$ ), you'd set the sample mean equal to $\alpha/\beta$ and the sample variance equal to $\alpha/\beta^2$ , then solve for both parameters.

MME is consistent but generally less efficient than MLE. Its advantage is computational simplicity, especially when the likelihood equations are hard to solve.

Confidence Intervals for Parameters

A $(1-\alpha) \times 100\%$ confidence interval $[L, U]$ provides a range of plausible values for a parameter $\theta$ , constructed so that $P(L \leq \theta \leq U) = 1 - \alpha$ .

Common construction methods:

Wald interval: Uses the asymptotic normality of the MLE. The interval is $\hat{\theta} \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta})$ , where $\text{SE}$ is the standard error.
Likelihood ratio interval: Inverts the likelihood ratio test.
Bootstrap: Resamples the data to approximate the sampling distribution of the estimator.

Confidence intervals narrow as sample size increases, reflecting greater precision.

Goodness-of-Fit Tests

After fitting a distribution, you need to check whether it actually fits the data well. Goodness-of-fit tests formalize this comparison.

Chi-Square Test

The chi-square test bins the data and compares observed counts to expected counts under the hypothesized distribution.

$\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$

where $O_i$ and $E_i$ are the observed and expected frequencies in bin $i$ .

Under the null hypothesis, this statistic follows a chi-square distribution with $k - 1 - m$ degrees of freedom, where $m$ is the number of parameters estimated from the data.

Practical considerations:

Expected frequencies should generally be at least 5 per bin for the approximation to hold.
Results can be sensitive to how you choose the bins. Using equal-probability bins (each bin has the same expected count) tends to work better than equal-width bins.

Kolmogorov-Smirnov Test

The KS test compares the empirical distribution function (EDF) directly to the theoretical CDF, with no binning required:

$D_n = \sup_x |F_n(x) - F(x)|$

where $F_n(x)$ is the EDF (the proportion of data points $\leq x$ ) and $F(x)$ is the hypothesized CDF.

The KS test is distribution-free under the null hypothesis when parameters are fully specified (not estimated from data). If you estimate parameters from the same data, you need modified critical values (the Lilliefors correction for normality testing, for instance).

The KS test tends to be more powerful than the chi-square test for small samples because it doesn't lose information through binning.

Applications in Actuarial Science

Modeling Claim Severity

Claim severity (the dollar amount of individual claims) is typically right-skewed: most claims are small, but a few are very large. Distributions commonly used include:

Gamma: Flexible shape, works well for moderate-tailed data
Lognormal: $Y = e^X$ where $X$ is normal; captures heavy right skew
Pareto: Heavy-tailed, often used for large or catastrophic claims

The choice depends on the line of business and the tail behavior observed in historical data. Accurate severity modeling directly affects premium calculations and reserve estimates.

Modeling Lifetime Distributions

In life insurance and annuity pricing, you model the time until death or failure:

Exponential: Constant hazard rate (unrealistic for human mortality, but useful as a baseline)
Gamma: Allows the hazard rate to change, though it can only model monotonically increasing hazard when $\alpha > 1$
Weibull: Allows increasing, decreasing, or constant hazard rates depending on its shape parameter, making it popular in survival analysis

Estimating lifetime distribution parameters accurately is critical for pricing life insurance, calculating annuity values, and assessing insurer solvency.

Pricing Insurance Products

Pricing combines frequency and severity models to estimate expected losses. Continuous distributions enter at multiple stages:

Severity modeling: Fitting distributions to historical claim amounts
Aggregate loss modeling: Combining frequency and severity (often using normal approximations via the CLT for large portfolios)
Investment returns: Modeling the time value of money with continuous distributions

Actuaries refine these models using risk classification, credibility theory, and experience rating. Stochastic simulation (e.g., Monte Carlo methods) tests how sensitive premiums are to model assumptions and quantifies uncertainty in the estimates.