Generalized Linear Models (GLMs) extend ordinary linear regression so that the response variable doesn't have to follow a normal distribution. This makes them well-suited for insurance data, where you're often modeling things like claim counts (discrete, non-negative) or claim amounts (continuous, positive, skewed). A GLM has three components that work together: the response variable distribution, the linear predictor, and the link function.

Response variable distribution

The response variable distribution specifies what probability distribution your data follows. Your choice here should reflect the actual nature of the data:

Poisson for count data (e.g., number of claims reported per period)
Gamma for continuous positive data (e.g., average claim amounts, which are right-skewed)
Binomial for binary outcomes (e.g., whether a claim exceeds a threshold)

Getting this choice wrong undermines everything downstream, so always examine the data's characteristics before selecting a distribution.

Linear predictor function

The linear predictor is the systematic part of the model. It combines your explanatory variables into a single linear expression:

$\eta = \beta_0 + \beta_1 \times \text{Development Period} + \beta_2 \times \text{Accident Period}$

You can include categorical predictors (like accident year), continuous predictors (like inflation rate), and interaction terms. The coefficients $\beta_0, \beta_1, \beta_2, \ldots$ are what you estimate from the data.

Link function

The link function connects the linear predictor $\eta$ to the expected value of the response variable $\mu$ . It ensures predictions stay within the valid range for your chosen distribution.

Log link (used with Poisson and Gamma): $\log(\mu) = \eta$ , which guarantees positive predictions
Logit link (used with Binomial): maps probabilities to the real line
Identity link (used with Normal): $\mu = \eta$ directly

The link function also affects how you interpret coefficients. With a log link, for instance, coefficients represent multiplicative effects rather than additive ones.

Model structure for reserving

GLMs provide a flexible framework for modeling how claims develop over time. The typical reserving model includes development periods and accident periods as factors, and potentially their interaction.

Development periods as factors

Development periods represent the time elapsed since a claim occurred until it's settled or reported (e.g., 0–12 months, 12–24 months, etc.). In the GLM, these are treated as categorical variables, with each period getting its own factor level. This lets the model capture the typical pattern of claims development, where most claims are reported early and the rate tapers off in later periods.

Accident periods as factors

Accident periods represent when claims occurred, typically measured in years or quarters (e.g., accident years 2015, 2016, ..., 2023). These are also categorical variables. Including them allows the model to account for differences in claim frequency or severity across different origin periods, which might arise from changes in underwriting, policy terms, or external conditions.

Interaction between development and accident periods

An interaction term $\text{Development Period} \times \text{Accident Period}$ captures situations where the development pattern itself changes across accident periods. For example, claims from more recent accident years might settle faster due to process improvements. Including interactions lets you estimate development factors specific to each accident period, though this adds many parameters and can lead to overfitting if the data is sparse.

Model fitting and estimation

Once you've specified the model structure, you need to estimate the parameters from the available data.

Maximum likelihood estimation

Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable under your assumed model. The process works as follows:

Write down the log-likelihood function based on your chosen distribution and link function.
Take partial derivatives with respect to each parameter.
Set these equal to zero and solve (usually iteratively, since closed-form solutions rarely exist for GLMs).

MLE produces estimates that are consistent (converge to the true values as sample size grows) and asymptotically efficient (achieve the lowest possible variance among unbiased estimators, given enough data).

Iterative weighted least squares

Iteratively reweighted least squares (IRLS) is the standard algorithm for fitting GLMs, and it's mathematically equivalent to MLE. Here's how it works:

Start with initial parameter estimates.
Compute weights based on the current estimates and the variance function of the response distribution.
Solve a weighted least squares problem using those weights.
Update the parameter estimates and repeat from step 2.
Stop when the estimates converge (i.e., change negligibly between iterations).

IRLS is computationally efficient and is what most statistical software uses behind the scenes when you fit a GLM.

Deviance and goodness of fit

Deviance measures how far your fitted model is from a saturated model (one with a parameter for every observation, giving a perfect fit). It's calculated as:

$D = 2 \times (\ell_{\text{saturated}} - \ell_{\text{fitted}})$

where $\ell$ denotes the log-likelihood. Lower deviance means better fit. You can also use deviance to compare nested models: the difference in deviance between two nested models follows an approximate chi-square distribution, which lets you test whether the additional parameters in the larger model are statistically justified.

Response variable distribution, Gamma distribution - wikidoc

Over-dispersed Poisson model

The standard Poisson distribution assumes the variance equals the mean. In practice, insurance claim data often shows more variability than this, a phenomenon called over-dispersion. The over-dispersed Poisson model handles this by introducing a scale parameter.

Variance proportional to mean

In the over-dispersed Poisson model, the variance-mean relationship becomes:

$\text{Var}(Y) = \phi \times \mathbb{E}(Y)$

where $\phi$ is the scale parameter (also called the dispersion parameter). When $\phi = 1$ , you recover the standard Poisson model. When $\phi > 1$ , the data is over-dispersed. For example, if $\phi = 2$ , the variance is twice the mean.

Scale parameter estimation

A common way to estimate $\phi$ is via the Pearson chi-square statistic:

$\hat{\phi} = \frac{\sum_{i=1}^{n} (y_i - \hat{\mu}_i)^2 / \hat{\mu}_i}{n - p}$

where $y_i$ is the observed value, $\hat{\mu}_i$ is the fitted mean, $n$ is the number of observations, and $p$ is the number of estimated parameters. Once you have $\hat{\phi}$ , you use it to adjust the standard errors of your coefficient estimates (multiplying them by $\sqrt{\hat{\phi}}$ ) and to correct goodness-of-fit statistics. Ignoring over-dispersion leads to artificially narrow confidence intervals and overly optimistic significance tests.

Pearson and deviance residuals

GLM residuals differ from ordinary regression residuals because the variance isn't constant across observations.

Pearson residuals standardize the raw residual by the estimated standard deviation:

$r_i^P = \frac{y_i - \hat{\mu}_i}{\sqrt{\hat{V}(\hat{\mu}_i)}}$

where $\hat{V}(\hat{\mu}_i)$ is the variance function evaluated at the fitted mean.

Deviance residuals are based on each observation's contribution to the total deviance. They tend to be more normally distributed than Pearson residuals and are more sensitive to outliers.

Both types are useful for diagnostic purposes: checking model fit, spotting patterns, and identifying influential observations.

Gamma model

The Gamma distribution is a natural choice for modeling continuous, positive response variables like average claim amounts. It's particularly appropriate when larger claims also tend to show greater variability.

Variance proportional to square of mean

The key feature of the Gamma model is its variance-mean relationship:

$\text{Var}(Y) = \frac{\mathbb{E}(Y)^2}{\alpha}$

where $\alpha$ is the shape parameter. This means the variance grows with the square of the mean, not linearly as in the Poisson case. If $\alpha = 4$ , the variance is one-quarter of the squared mean. This quadratic relationship makes the Gamma model suitable for data where variability increases substantially as the mean increases.

Shape and scale parameterization

The Gamma distribution is parameterized by shape ( $\alpha$ ) and scale ( $\theta$ ):

Shape $\alpha$ : controls skewness and variability. Larger $\alpha$ produces a more symmetric, less variable distribution.
Scale $\theta$ : controls spread. Larger $\theta$ means more dispersion.

The moments are:

Mean: $\mathbb{E}(Y) = \alpha \times \theta$
Variance: $\text{Var}(Y) = \alpha \times \theta^2$

For example, with $\alpha = 4$ and $\theta = 0.5$ , the mean is 2 and the variance is 1.

Log link function

The Gamma GLM typically uses a log link:

$\log(\mu) = \eta$

This guarantees that predicted means are always positive, which is necessary since claim amounts can't be negative. To recover the predicted mean from the linear predictor, you exponentiate: if $\eta = 1.5$ , then $\mu = \exp(1.5) \approx 4.48$ . The log link also means that coefficients have a multiplicative interpretation on the original scale.

Model diagnostics

After fitting a GLM, you need to verify that the model is adequate and that its assumptions hold. Skipping diagnostics can lead to unreliable reserve estimates.

Residual plots

Plot residuals (Pearson or deviance) against fitted values and against each explanatory variable. What you're looking for:

Random scatter around zero: the model is capturing the systematic patterns well.
Funnel shape (increasing spread): suggests the variance function is misspecified.
Curved pattern: suggests non-linearity that the model isn't capturing.

These plots are your first line of defense against model misspecification.

Q-Q plots

A Quantile-Quantile plot compares the distribution of your residuals to a theoretical reference distribution (typically standard normal). If the model assumptions hold, the points should fall roughly along a straight line. Common departures include:

S-shaped curve: indicates heavier tails than the assumed distribution
Systematic departure at the extremes: suggests the distribution choice may be wrong

Response variable distribution, Binomial distribution - Wikipedia

Outlier detection

Outliers can distort parameter estimates and inflate or deflate reserve predictions. Two key diagnostic measures:

Standardized residuals: values exceeding 2–3 in absolute value warrant investigation.
Cook's distance: measures how much the fitted model changes when a single observation is removed. High Cook's distance points are influential observations.

Outliers should be investigated, not automatically deleted. They might represent data errors, or they might be legitimate large claims that require special modeling treatment.

Model selection

When you have several candidate GLMs, you need a principled way to choose among them.

Nested vs. non-nested models

Nested models are hierarchical: one model is a restricted version of the other (e.g., a model without an interaction term is nested within a model that includes it). You can compare these using likelihood ratio tests.

Non-nested models have fundamentally different structures (e.g., different distributions or different sets of predictors). These require information criteria like AIC for comparison.

Likelihood ratio test

The likelihood ratio test (LRT) compares two nested models:

Fit both the simpler (reduced) and more complex (full) models.
Compute the test statistic: $\Lambda = -2(\ell_{\text{reduced}} - \ell_{\text{full}})$ .
Under the null hypothesis (that the simpler model is adequate), $\Lambda$ follows a chi-square distribution with degrees of freedom equal to the difference in number of parameters.
If the p-value is small, the additional parameters in the full model provide a statistically significant improvement.

For over-dispersed models, use an F-test version that accounts for the scale parameter.

Akaike information criterion (AIC)

The AIC balances fit against complexity:

$\text{AIC} = -2\ell + 2p$

where $\ell$ is the maximized log-likelihood and $p$ is the number of estimated parameters. Lower AIC is better. The AIC penalizes model complexity, so it favors parsimonious models that still fit the data well. Unlike the LRT, AIC can compare non-nested models, making it a versatile selection tool. When comparing candidates, differences in AIC of less than about 2 suggest the models are roughly equivalent.

Advantages of GLMs for reserving

Flexibility in modeling

GLMs accommodate different response distributions, so you can use a Poisson GLM for claim counts and a Gamma GLM for average claim amounts within the same reserving framework. The choice of link function adds further flexibility. You can also include any combination of categorical and continuous predictors, along with interactions.

Incorporation of external factors

Traditional methods like chain-ladder treat the run-off triangle in isolation. GLMs let you bring in external information as additional predictors: regulatory changes, inflation indices, shifts in claims handling procedures, or economic indicators. This can improve reserve accuracy when the claims environment is changing.

Ability to quantify uncertainty

GLMs produce standard errors for all parameter estimates, which you can use to build confidence intervals around predicted reserves. For a fuller picture of reserve variability, you can apply bootstrapping or simulation:

Resample residuals or simulate new data from the fitted model.
Refit the GLM to each resampled dataset.
Generate a distribution of reserve estimates.

This gives you percentiles and tail measures that are valuable for risk management and regulatory reporting.

Limitations and considerations

Appropriateness of distributional assumptions

The entire GLM framework rests on your choice of distribution and link function. If these are wrong, parameter estimates can be biased and predictions unreliable. For example, using a Poisson distribution for claim amounts (which are continuous and can take any positive value) would be a fundamental misspecification. Always use diagnostic tools to verify your assumptions after fitting.

Sensitivity to outliers

A single unusually large claim can pull coefficient estimates substantially, leading to over- or under-estimation of reserves. Diagnostic tools like Cook's distance help identify these cases. Possible remedies include robust estimation methods, capping extreme values, or fitting separate models for large and attritional claims.

Computational complexity

As you add more predictors, interaction terms, and observations, fitting time increases. The iterative nature of IRLS means each additional parameter adds to every iteration's computation. For large-scale reserving problems with many accident periods, development periods, and interactions, you may need efficient implementations using sparse matrix algebra or parallel processing. That said, for most practical reserving applications, modern computing power handles GLMs without difficulty.