Parametric distributions give actuaries a way to describe claim severity using mathematical functions with a small number of parameters. Choosing the right distribution matters because it directly affects how accurately you can price policies, set reserves, and measure risk.

This section covers the main distribution families used for severity modeling, how to fit them to data, their key properties, and how they're applied in practice.

Types of Parametric Distributions

Parametric distributions define the shape of claim data through specific functional forms and parameters. By understanding their characteristics, you can select the model that best fits your data and use it to estimate quantities like expected losses, VaR, and TVaR.

Continuous vs. Discrete Distributions

Continuous distributions take values over an unbroken range and are the primary tool for modeling claim severity (the dollar amount of each claim). The exponential, gamma, lognormal, Weibull, and Pareto distributions all fall into this category.

Discrete distributions take countable values and are used to model claim frequency (how many claims occur). The Poisson and negative binomial are the most common examples.

The distinction matters because severity and frequency are modeled separately and then combined to get aggregate loss distributions. This unit focuses on the continuous (severity) side.

Light-Tailed vs. Heavy-Tailed Distributions

Tail behavior describes how quickly the probability of very large values drops off.

Light-tailed distributions (e.g., exponential, gamma) have tails that decay exponentially. Large claims are possible but become extremely unlikely very fast. These distributions always have finite moments of all orders.
Heavy-tailed distributions (e.g., Pareto, lognormal) have tails that decay more slowly, often following a power law. They assign meaningfully higher probability to extreme claims. For the Pareto, if the shape parameter $\alpha \leq 2$ , the variance is infinite.

Correctly identifying tail behavior is critical. If you fit a light-tailed model to data that's actually heavy-tailed, you'll systematically underestimate the probability of large losses and underprice the risk.

Common Parametric Distributions

Each distribution below has a different shape, number of parameters, and range of behaviors. The right choice depends on the line of business, the data you observe, and the tail behavior you need to capture.

Exponential Distribution

The exponential distribution is the simplest continuous severity model. It has a single parameter, the rate $\lambda > 0$ .

PDF: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$
Mean: $1/\lambda$
Variance: $1/\lambda^2$

Its defining feature is a constant hazard rate: the probability of a claim exceeding $x + d$ given it already exceeds $x$ doesn't depend on $x$ . This "memoryless" property makes it mathematically convenient but often too restrictive for real claim data, where large claims tend to behave differently than small ones. It works best for small-to-moderate, relatively homogeneous claims.

Gamma Distribution

The gamma distribution generalizes the exponential by adding a shape parameter, giving it much more flexibility.

Parameters: shape $\alpha > 0$ , scale $\theta > 0$
PDF: $f(x) = \frac{1}{\Gamma(\alpha)\,\theta^\alpha}\,x^{\alpha-1}\,e^{-x/\theta}$ for $x \geq 0$
Mean: $\alpha\theta$
Variance: $\alpha\theta^2$

When $\alpha = 1$ , it reduces to the exponential. As $\alpha$ increases, the distribution becomes more bell-shaped and shifts rightward. The gamma can handle light-tailed to moderately heavy-tailed data depending on parameter values, making it a versatile first choice for severity modeling.

Weibull Distribution

The Weibull is particularly useful when the hazard rate isn't constant.

Parameters: shape $\beta > 0$ , scale $\eta > 0$
PDF: $f(x) = \frac{\beta}{\eta}\left(\frac{x}{\eta}\right)^{\beta-1}e^{-(x/\eta)^\beta}$ for $x \geq 0$

The shape parameter $\beta$ controls the hazard rate behavior:

$\beta < 1$ : decreasing hazard rate
$\beta = 1$ : constant hazard rate (reduces to exponential with $\lambda = 1/\eta$ )
$\beta > 1$ : increasing hazard rate

This flexibility in hazard rate shape makes the Weibull common in property and casualty lines where the conditional probability of exceeding a loss threshold changes with loss size.

Pareto Distribution

The Pareto is the go-to distribution for modeling large claims, excess layers, and catastrophe losses.

Parameters: shape $\alpha > 0$ , scale (minimum value) $x_m > 0$
PDF: $f(x) = \frac{\alpha\, x_m^\alpha}{x^{\alpha+1}}$ for $x \geq x_m$
Mean: $\frac{\alpha\, x_m}{\alpha - 1}$ (exists only when $\alpha > 1$ )
Variance: $\frac{\alpha\, x_m^2}{(\alpha-1)^2(\alpha-2)}$ (exists only when $\alpha > 2$ )

The power-law tail $\sim x^{-(\alpha+1)}$ decays much more slowly than exponential tails. This is why the Pareto assigns substantial probability to extreme values. Note that for small $\alpha$ , even the mean or variance can be infinite, which has real consequences for pricing and reserving. In reinsurance, the Pareto is frequently used to model claims that exceed a high retention.

Lognormal Distribution

If $X$ is normally distributed with mean $\mu$ and standard deviation $\sigma$ , then $Y = e^X$ follows a lognormal distribution.

Parameters: $\mu$ (location) and $\sigma > 0$ (scale) of the underlying normal
PDF: $f(x) = \frac{1}{x\,\sigma\sqrt{2\pi}}\,e^{-(\ln x - \mu)^2/(2\sigma^2)}$ for $x > 0$
Mean: $e^{\mu + \sigma^2/2}$
Variance: $e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$

The lognormal is always positively skewed and heavy-tailed (though lighter-tailed than the Pareto). It's a natural model when you believe claim sizes arise from the multiplicative interaction of many factors. It's widely used for liability and medical malpractice claims.

Continuous vs discrete distributions, Modeling Frequency and Severity of Insurance Claims in an Insurance Portfolio

Fitting Parametric Distributions

Once you've chosen a candidate distribution, you need to estimate its parameters from data and then check whether the fit is adequate.

Method of Moments

The idea is simple: set the theoretical moments equal to the sample moments and solve for the parameters.

Steps for a two-parameter distribution (e.g., gamma):

Compute the sample mean $\bar{x}$ and sample variance $s^2$ from your data.
Set $\bar{x} = \alpha\theta$ and $s^2 = \alpha\theta^2$ .
Solve: $\hat{\theta} = s^2 / \bar{x}$ and $\hat{\alpha} = \bar{x} / \hat{\theta} = \bar{x}^2 / s^2$ .

Method of moments is easy to compute and gives you a quick starting estimate. However, it can be inefficient (high variance estimates) for small samples and doesn't use all the information in the data. It can also be sensitive to outliers since sample moments are sensitive to extreme values.

Maximum Likelihood Estimation

MLE finds the parameter values that make the observed data most probable under the assumed model.

Steps:

Write the likelihood function: $L(\boldsymbol{\theta}) = \prod_{i=1}^{n} f(x_i \mid \boldsymbol{\theta})$ , where $f$ is the PDF and $\boldsymbol{\theta}$ is the parameter vector.
Take the log to get the log-likelihood: $\ell(\boldsymbol{\theta}) = \sum_{i=1}^{n} \ln f(x_i \mid \boldsymbol{\theta})$ .
Differentiate with respect to each parameter, set the derivatives equal to zero, and solve.
For distributions where closed-form solutions don't exist (e.g., Weibull, lognormal with censored data), use numerical optimization.

MLE is the standard in actuarial practice because, under regularity conditions, the estimates are consistent (converge to true values as $n \to \infty$ ), asymptotically efficient (lowest variance among consistent estimators), and asymptotically normal (which gives you confidence intervals). The main drawback is that MLE can be sensitive to model misspecification and may require iterative numerical methods.

Goodness-of-Fit Tests

After fitting, you need to verify that the distribution actually describes your data well.

Chi-square test: Groups data into bins, compares observed vs. expected frequencies. Sensitive to how you choose the bins.
Kolmogorov-Smirnov (K-S) test: Measures the maximum absolute difference between the empirical CDF and the fitted CDF. Works with ungrouped data but has less power in the tails.
Anderson-Darling test: Similar to K-S but gives more weight to the tails of the distribution. This makes it particularly valuable for actuarial work where tail fit matters most.

A significant p-value (typically $p < 0.05$ ) suggests the fitted distribution doesn't adequately capture the data. But don't rely on a single test. Use Q-Q plots and P-P plots alongside formal tests to visually assess fit, especially in the tails.

Properties of Parametric Distributions

Moments of Distributions

Moments quantify the shape of a distribution:

First moment (mean): The expected value, representing the center of the distribution.
Second central moment (variance): Measures spread around the mean. Its square root is the standard deviation.
Higher moments capture finer shape details like asymmetry and tail weight.

For severity modeling, the mean tells you the average claim size, and the variance tells you how much individual claims vary. Higher moments become important when you need to understand the likelihood of extreme outcomes.

Skewness and Kurtosis

Skewness measures asymmetry. Claim severity distributions are almost always positively skewed (right-skewed), meaning there's a long right tail of large claims.

Kurtosis measures tail weight. The normal distribution has a kurtosis of 3 (sometimes reported as "excess kurtosis" of 0). Distributions with kurtosis greater than 3 are called leptokurtic and have heavier tails than the normal.

For reference:

The lognormal is positively skewed, and both its skewness and kurtosis increase as $\sigma$ increases.
The exponential always has a skewness of 2 and kurtosis of 9.
The Pareto can have extremely high skewness and kurtosis, especially for small $\alpha$ .

Tail Behavior and Extreme Values

The tail is where the money is in insurance. A small number of very large claims can dominate total losses.

Light tails decay like $e^{-cx}$ or faster. The moment generating function (MGF) exists for these distributions, which means all moments are finite.

Heavy tails decay like $x^{-a}$ (power law). The MGF does not exist, and depending on the parameters, some moments may be infinite.

Extreme value theory (EVT) provides specialized tools for the tail:

The Generalized Pareto Distribution (GPD) models exceedances above a high threshold. If you're looking at claims above $1 million, the GPD is the theoretically justified model for those exceedances.
The Generalized Extreme Value (GEV) distribution models the maximum of blocks of data (e.g., the largest claim each year).

EVT is especially useful when you have limited data in the tail and need to extrapolate beyond observed losses.

Applications in Insurance

Modeling Claim Severity

Claim severity is the dollar amount of individual claims. The modeling process typically follows these steps:

Collect and clean historical claim data, adjusting for inflation and development.
Explore the data using histograms, Q-Q plots, and summary statistics.
Select candidate distributions based on the data's shape and tail behavior.
Fit each candidate using MLE (or method of moments as a starting point).
Compare fits using goodness-of-fit tests, information criteria (AIC, BIC), and visual diagnostics.
Use the selected model to estimate risk measures like expected value, VaR, and TVaR.

The choice of distribution varies by line. Homeowners claims might fit a gamma or Weibull well, while liability or catastrophe claims often require a Pareto or lognormal.

Pricing Insurance Products

Parametric severity distributions feed directly into pricing:

The pure premium (expected loss per exposure) equals the product of expected claim frequency and expected claim severity.
The fitted severity distribution lets you compute not just the mean but also the variability of losses, which affects risk loadings.
Sensitivity analysis using the fitted distribution shows how changes in parameters (e.g., a shift in the scale parameter due to inflation) affect the premium.

Estimating Reserves

Reserves are funds set aside for future claim payments on policies already written. Parametric distributions help by:

Modeling the ultimate size of claims that are still developing (especially for long-tailed lines like workers' compensation).
Projecting the timing of future payments.
Quantifying reserve uncertainty through stochastic methods like bootstrap simulation or MCMC, which draw from the fitted parametric model to generate thousands of scenarios.

Reinsurance and Risk Management

Reinsurance structures are designed around the tail of the severity distribution.

In an excess-of-loss treaty, the reinsurer pays claims above a retention $d$ . The expected cost to the reinsurer is $E[\max(X - d, 0)]$ , which depends heavily on the tail of the fitted distribution.
In a quota share treaty, the reinsurer takes a fixed percentage of every claim, so the full distribution matters.
The Pareto distribution is especially common in reinsurance pricing because its power-law tail matches the empirical behavior of large losses.

Parametric models also support capital allocation by quantifying tail risk through measures like TVaR at the 99th or 99.5th percentile.

Advantages and Limitations

Simplicity and Interpretability

Parametric distributions compress complex data into a few meaningful parameters. You can describe an entire claim severity distribution by saying "gamma with $\alpha = 2.5$ and $\theta = 4{,}000$ ," which immediately tells a fellow actuary the mean ( $10{,}000$ ), variance, and general shape. This makes communication with stakeholders straightforward.

Flexibility and Parsimony

With the right choice of distribution family, a two- or three-parameter model can capture a wide range of data patterns. Fewer parameters means less overfitting risk and more stable estimates compared to non-parametric approaches, especially with limited data.

Extrapolation and Robustness

A key advantage of parametric models is the ability to extrapolate into the tail beyond observed data. If you've fitted a Pareto with $\alpha = 1.5$ and $x_m = 100{,}000$ , you can estimate the probability of a $10 million claim even if your data only goes up to $5 million.

The risk is obvious: if the distribution is wrong, the extrapolation is wrong. Misspecification in the tail can lead to severe under- or over-estimation of extreme losses. Always validate tail fit carefully and consider the sensitivity of your conclusions to the distributional assumption.

Alternative Approaches

When a single parametric distribution doesn't fit well, consider:

Mixture models: Combine two or more distributions (e.g., a gamma for small claims and a Pareto for large claims) to handle heterogeneous populations.
Kernel density estimation (KDE): A non-parametric method that estimates the PDF directly from data. Flexible but can't extrapolate beyond the data range.
Spliced models: Use one distribution below a threshold and another above it, which is common in practice for severity modeling.
Machine learning methods: Neural networks and gradient boosting can model complex patterns but sacrifice interpretability and the ability to extrapolate into the tail.