Fiveable

📊Actuarial Mathematics Unit 6 Review

QR code for Actuarial Mathematics practice questions

6.5 Generalized linear models and rating factors

6.5 Generalized linear models and rating factors

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Components of generalized linear models

Generalized linear models (GLMs) extend ordinary linear regression so you can work with response variables that aren't normally distributed. That flexibility makes them the workhorse of modern insurance pricing. A GLM has three components that work together: an exponential family distribution, a link function, and a linear predictor.

Exponential family of distributions

The exponential family is a broad class of distributions that share a common mathematical form. Members include the normal, binomial, Poisson, and gamma distributions.

Each distribution is characterized by a mean and variance that can be written as functions of a natural parameter and a dispersion parameter. Your choice of distribution depends on the nature of the response variable:

  • Normal for continuous responses (e.g., claim severity in dollars)
  • Poisson for count data (e.g., number of claims per policy)
  • Binomial for binary outcomes (e.g., claim vs. no claim)
  • Gamma for positive, right-skewed continuous data (e.g., individual loss amounts)

A link function connects the linear predictor to the expected value of the response variable. It transforms E(Y)E(Y) onto the real line so the model's predictions stay consistent with the chosen distribution.

Common link functions and their typical pairings:

DistributionLink functionFormulaEffect interpretation
NormalIdentityg(μ)=μg(\mu) = \muAdditive effects
BinomialLogitg(μ)=ln ⁣(μ1μ)g(\mu) = \ln\!\left(\frac{\mu}{1-\mu}\right)Log-odds / odds ratios
PoissonLogg(μ)=ln(μ)g(\mu) = \ln(\mu)Multiplicative effects
GammaInverseg(μ)=1/μg(\mu) = 1/\muReciprocal effects

The link you choose shapes how you interpret the coefficients, so it's worth thinking carefully about whether additive or multiplicative effects make more sense for your problem.

Linear predictors

The linear predictor is the systematic part of the model. It's a linear combination of explanatory variables and their coefficients:

η=β0+β1x1+β2x2++βpxp\eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p

Each coefficient βj\beta_j quantifies the effect of variable xjx_j on the transformed expected response, holding the other variables constant. The linear predictor can also include interaction terms and polynomial terms, giving you flexibility to capture complex relationships.

Model fitting and estimation

Fitting a GLM means finding the coefficient values that best describe the relationship between your explanatory variables and the response. This typically involves maximum likelihood estimation, carried out through an iterative algorithm, with goodness-of-fit measures used to evaluate the result.

Maximum likelihood estimation

Maximum likelihood estimation (MLE) finds the coefficients that make the observed data most probable under the assumed distribution and link function.

The process works like this:

  1. Write down the likelihood function L(β)L(\boldsymbol{\beta}), which gives the probability of the observed data as a function of the coefficients.
  2. Take the log-likelihood (β)=lnL(β)\ell(\boldsymbol{\beta}) = \ln L(\boldsymbol{\beta}) (easier to work with mathematically).
  3. Differentiate with respect to each βj\beta_j and set the resulting score equations to zero.
  4. Solve using an optimization algorithm such as Newton-Raphson or Fisher scoring.

Under standard regularity conditions, MLEs are asymptotically unbiased, consistent, and efficient.

Iterative weighted least squares

Because the score equations for GLMs generally don't have closed-form solutions, they're solved with iterative weighted least squares (IWLS):

  1. Start with initial coefficient estimates (often from an ordinary least squares fit or zeros).
  2. Compute a working response and working weights based on the current estimates of the mean and variance.
  3. Fit a weighted least squares regression using those working values.
  4. Update the coefficient estimates from the weighted regression.
  5. Repeat steps 2-4 until the estimates converge (changes fall below a tolerance threshold).

Convergence usually happens within a small number of iterations. IWLS is the standard algorithm behind GLM fitting in most statistical software.

Deviance and goodness of fit

Deviance measures how far your fitted model is from a perfect (saturated) model that has one parameter per observation:

D=2[(saturated model)(fitted model)]D = 2\bigl[\ell(\text{saturated model}) - \ell(\text{fitted model})\bigr]

Smaller deviance means a better fit. Under the null hypothesis that the fitted model is correct, the deviance approximately follows a chi-squared distribution with degrees of freedom equal to npn - p, where nn is the number of observations and pp is the number of estimated parameters.

Other useful measures:

  • Pearson chi-square statistic: an alternative summary of residual discrepancy
  • AIC (Akaike Information Criterion): balances fit against model complexity; lower is better
  • BIC (Bayesian Information Criterion): similar to AIC but penalizes complexity more heavily for large samples

Types of generalized linear models

The specific GLM you use depends on the nature of your response variable. Three types dominate actuarial practice.

Linear regression models

Linear regression assumes a continuous, normally distributed response and uses the identity link: E(Y)=ηE(Y) = \eta. Each coefficient represents the additive change in the expected response for a one-unit increase in the corresponding variable, holding everything else constant.

Actuarial uses include modeling claim severity or loss reserves where the response is a monetary amount.

Logistic regression models

Logistic regression handles binary outcomes (e.g., claim occurrence: yes/no) using the logit link:

ln ⁣(p1p)=β0+β1x1++βpxp\ln\!\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p

Each coefficient represents the change in the log-odds of the event for a one-unit increase in the variable. Exponentiating gives you the odds ratio, which is often more intuitive.

Common actuarial applications: modeling claim occurrence, policy lapse, or fraud indicators.

Poisson regression models

Poisson regression is used for count responses (e.g., number of claims per exposure period) with the log link:

ln(E(Y))=β0+β1x1++βpxp\ln\bigl(E(Y)\bigr) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p

Each coefficient represents the change in the log of the expected count for a one-unit increase in the variable. Exponentiating a coefficient gives a rate ratio, the multiplicative change in the expected count.

An important practical detail: you'll often include an offset term ln(exposure)\ln(\text{exposure}) so the model predicts a rate (claims per unit of exposure) rather than a raw count.

Interpretation of model coefficients

Understanding what the coefficients actually tell you is where GLMs become useful for decision-making. The interpretation depends on the link function.

Significance testing

Significance tests assess whether a coefficient is statistically distinguishable from zero. Two common approaches:

  • Wald test: compares the coefficient estimate to its standard error. The test statistic is z=β^/SE(β^)z = \hat{\beta}/\text{SE}(\hat{\beta}).
  • Likelihood ratio test: compares the log-likelihood of the full model to a reduced model without the variable of interest. The test statistic is Λ=2[(full)(reduced)]\Lambda = 2[\ell(\text{full}) - \ell(\text{reduced})], which follows an approximate chi-squared distribution.

A p-value below your chosen significance level (commonly 0.05) provides evidence that the variable has a real relationship with the response.

Confidence intervals

A confidence interval gives a range of plausible values for a coefficient. For a 95% interval:

β^±1.96×SE(β^)\hat{\beta} \pm 1.96 \times \text{SE}(\hat{\beta})

If the interval excludes zero, the variable is significant at the 5% level, consistent with the hypothesis test. For exponentiated coefficients, you exponentiate the endpoints of the interval to get a confidence interval for the odds ratio or rate ratio.

Exponentiated coefficients

With non-identity link functions, raw coefficients are on a transformed scale. Exponentiating them brings the interpretation back to the original scale:

  • Logistic regression: eβje^{\beta_j} is the odds ratio. A value of 1.25 means the odds of the event increase by 25% for a one-unit increase in xjx_j.
  • Poisson regression: eβje^{\beta_j} is the rate ratio. A value of 1.10 means the expected count increases by 10% for a one-unit increase in xjx_j.

These multiplicative interpretations are much easier to communicate to stakeholders than log-odds or log-rates.

Rating factors in insurance pricing

Rating factors are the explanatory variables in a pricing GLM. They capture policyholder characteristics, object attributes, and coverage details that drive claim risk. Selecting the right factors involves actuarial judgment, regulatory constraints, and statistical evidence.

Categorical vs. continuous factors

  • Categorical factors (e.g., vehicle type, occupation, territory) take a finite number of levels. In the GLM, each level beyond the reference level gets its own dummy variable and coefficient.
  • Continuous factors (e.g., driver age, years of experience, sum insured) enter the linear predictor directly. You can also transform them (e.g., splines, polynomials) if the relationship with the response is non-linear.

The decision to treat a variable as categorical or continuous depends on the shape of its relationship with the response and on how much data you have at each level. A continuous variable with a clearly non-linear effect is sometimes binned into categories for simplicity, though this sacrifices some information.

Interactions between factors

An interaction exists when the effect of one factor depends on the level of another. For example, the effect of driver age on claim frequency might differ between urban and rural territories.

In the GLM, an interaction between factors x1x_1 and x2x_2 is represented by including the product term x1×x2x_1 \times x_2 in the linear predictor. The interaction coefficient captures the additional effect beyond what the two main effects predict on their own.

Including interactions improves accuracy but increases model complexity. You should test whether interactions are statistically significant before adding them, and watch for data sparsity in cross-classified cells.

Relativities and factor levels

Relativities are the exponentiated coefficients for each level of a categorical rating factor, expressed relative to a chosen base level (whose relativity is 1.00 by definition).

For example, in an auto insurance frequency model with vehicle type as a factor and "sedan" as the base:

Vehicle typeCoefficientRelativity eβe^{\beta}Interpretation
Sedan (base)0.0001.00Reference
Sports car0.3361.4040% higher frequency
SUV-0.1050.9010% lower frequency
The granularity of factor levels matters. Too many levels can lead to overfitting and sparse data; too few can mask meaningful risk differences.

Model selection and validation

A good GLM balances accuracy with parsimony. Model selection and validation ensure you're not overfitting to noise or missing important structure.

Stepwise selection procedures

Stepwise methods automate variable selection by iteratively adding or removing variables based on statistical criteria:

  1. Forward selection: Start with no variables. At each step, add the variable that most improves the model (lowest p-value or greatest AIC reduction). Stop when no remaining variable meets the inclusion criterion.
  2. Backward elimination: Start with all candidate variables. At each step, remove the variable that contributes least. Stop when all remaining variables are significant.
  3. Stepwise (bidirectional): Combines both, allowing additions and removals at each step.

Stepwise methods are convenient but have well-known limitations: they can miss the globally best subset, and p-values from the final model are biased because of the selection process. Use them as a starting point, not a final answer.

Cross-validation techniques

Cross-validation estimates how well your model will perform on new data:

  1. K-fold cross-validation: Split the data into KK equal folds (commonly K=5K = 5 or K=10K = 10). For each fold, fit the model on the other K1K-1 folds and measure prediction error on the held-out fold. Average the errors across all folds.
  2. Leave-one-out cross-validation (LOOCV): A special case where K=nK = n. Each observation serves as its own validation set. Computationally expensive but low-bias.

The cross-validation error lets you compare competing models on predictive performance rather than in-sample fit alone.

Akaike and Bayesian information criteria

Both AIC and BIC penalize model complexity to discourage overfitting:

  • AIC=2(β^)+2p\text{AIC} = -2\ell(\hat{\boldsymbol{\beta}}) + 2p
  • BIC=2(β^)+pln(n)\text{BIC} = -2\ell(\hat{\boldsymbol{\beta}}) + p \ln(n)

where \ell is the maximized log-likelihood, pp is the number of parameters, and nn is the sample size.

BIC's penalty grows with sample size, so it tends to favor simpler models than AIC does, especially for large datasets. Lower values indicate a better fit-complexity trade-off. Both criteria can compare non-nested models (e.g., GLMs with different distributions or link functions).

Assumptions and limitations

GLMs rely on assumptions that, if violated, can produce misleading results. Checking these assumptions is a routine part of model development.

Independence of observations

GLMs assume each observation is independent. When data are clustered (e.g., multiple policies per household) or longitudinal (repeated measures over time), standard errors will be underestimated and significance overstated.

Remedies include clustered standard errors, generalized estimating equations (GEEs), or mixed models with random effects.

Overdispersion and underdispersion

Overdispersion means the observed variance exceeds what the model distribution predicts. This is especially common with Poisson models, where the distribution assumes Var(Y)=E(Y)\text{Var}(Y) = E(Y) but the data show Var(Y)>E(Y)\text{Var}(Y) > E(Y).

Ignoring overdispersion leads to standard errors that are too small and p-values that are too optimistic. Common fixes:

  • Quasi-Poisson models: estimate a dispersion parameter ϕ\phi and scale standard errors by ϕ\sqrt{\phi}
  • Negative binomial regression: explicitly models extra-Poisson variation
  • Random effects: capture unobserved heterogeneity across groups

Underdispersion (variance less than expected) is rarer but can similarly distort inference.

Residual diagnostics

Residuals are the gap between observed and predicted values. Examining them helps you spot model problems:

  • Deviance residuals and Pearson residuals are the most common types for GLMs, standardized so they're comparable across observations.
  • Plot residuals against fitted values to check for patterns. A well-specified model should show no systematic trend.
  • Plot residuals against each explanatory variable to detect non-linearity that the model hasn't captured.
  • Look for high-leverage points or influential observations using Cook's distance or DFBETAS.

Residual patterns guide model refinement: a curved pattern against a variable suggests you need a non-linear term; a fan shape suggests the variance function may be wrong.

Applications in actuarial practice

GLMs are central to modern actuarial work, spanning pricing, reserving, and capital modeling.

Pricing and ratemaking

The standard approach to insurance pricing fits separate GLMs for claim frequency and claim severity, then combines them:

Pure premium=E(frequency)×E(severity)\text{Pure premium} = E(\text{frequency}) \times E(\text{severity})

Each GLM includes rating factors such as age, territory, vehicle type, and coverage level. The exponentiated coefficients translate directly into the relativities that form the rating structure. This multiplicative framework makes it straightforward to calculate a premium for any combination of factor levels and to communicate the pricing logic to regulators and underwriters.

Claim frequency and severity modeling

Frequency and severity are modeled separately because they typically follow different distributions and respond to different risk factors.

  • Frequency models commonly use Poisson or negative binomial regression with a log link and an exposure offset. The response is the number of claims per policy-period.
  • Severity models commonly use gamma or inverse Gaussian regression with a log link. The response is the average cost per claim, conditional on a claim having occurred.

Separating the two components gives actuaries more diagnostic power: you can see whether a rate change is driven by more claims, costlier claims, or both.