Survival analysis gives actuaries a rigorous framework for modeling time-to-event data: when does a policyholder die, become disabled, lapse a policy, or default on a loan? These questions sit at the heart of pricing, reserving, and risk management. The Cox proportional hazards model is the workhorse semi-parametric tool for this work, but it fits within a broader ecosystem of nonparametric estimators, parametric distributions, and extensions for competing risks and recurrent events.
Survival analysis fundamentals
Survival analysis studies the time until an event of interest occurs. In actuarial work, that event might be death, disability onset, policy lapse, or loan default. What makes survival analysis distinct from ordinary regression is its ability to handle incomplete observations through censoring and truncation.
Three building blocks define any survival analysis problem: the time origin and scale you choose, how you handle incomplete data, and the mathematical functions (survival and hazard) that describe the event process.
Time origin and scale
Time origin is the clock's starting point. Common choices include birth (for mortality studies), policy inception (for lapse analysis), or diagnosis date (for disability duration). Time scale is the unit you measure in: years, months, or days.
The choice matters more than it might seem. Modeling mortality from birth versus from policy inception can lead to different baseline hazard shapes and different covariate effects. Always match the origin and scale to the research question.
Censoring and truncation
Censoring means you know the individual is in the study, but you don't observe their exact event time:
- Right-censoring: The study ends (or the person drops out) before the event occurs. This is the most common type.
- Left-censoring: You know the event happened before a certain time, but not exactly when.
- Interval-censoring: The event occurred between two observation times, but you don't know exactly when within that interval.
Truncation means certain individuals are systematically excluded from your data:
- Left-truncation (delayed entry): Individuals enter observation only after surviving to a certain age or time. Pension studies often have this, since you only observe people who survived to employment.
- Right-truncation: Only individuals who have already experienced the event are included.
Ignoring censoring or truncation introduces bias. Every survival method you'll encounter has been designed specifically to handle these features correctly.
Survival and hazard functions
The survival function gives the probability of surviving beyond time :
It starts at and decreases toward 0. A steeper drop means events are happening faster.
The hazard function gives the instantaneous rate of the event at time , conditional on having survived to :
The hazard is not a probability (it can exceed 1). Think of it as the "intensity" of the event at each moment.
These two functions are linked through the cumulative hazard function :
Which gives the equivalent relationship:
And the derivative relationship stated more compactly:
If you know any one of , , or , you can derive the other two.
Nonparametric estimation
Nonparametric methods estimate survival and hazard functions directly from the data without assuming any particular distributional form. They're your first step in any analysis: look at the data before imposing structure on it.
Kaplan-Meier estimator
The Kaplan-Meier (KM) estimator (also called the product-limit estimator) estimates from censored data. At each observed event time :
where is the number of events at time and is the number at risk just before .
The result is a step function that drops at each event time and stays flat between events. Censored observations reduce the risk set but don't cause a drop in the curve. This is what makes the KM estimator so useful: it extracts survival information even when many observations are censored.
Nelson-Aalen estimator
The Nelson-Aalen estimator targets the cumulative hazard function rather than directly:
You can convert this to a survival estimate via , which tends to perform slightly better than the KM estimator in small samples. The Nelson-Aalen estimator is also the foundation for several residual diagnostics used with the Cox model.
Confidence intervals and bands
Pointwise confidence intervals quantify uncertainty at a single time point. For the KM estimator, Greenwood's formula gives the variance, and you typically compute intervals on the log or complementary log-log scale to keep them within .
Simultaneous confidence bands (such as the Hall-Wellner or EP bands) cover the entire survival curve with a specified probability. They're wider than pointwise intervals because they must hold at all time points simultaneously.
Both tools help you assess whether apparent differences between survival curves are real or just noise.
Parametric survival models
Parametric models assume a specific functional form for or , governed by a small number of parameters. The tradeoff: you gain efficiency and the ability to extrapolate, but you risk bias if the assumed form is wrong. Always check parametric fits against the nonparametric KM curve.
Exponential distribution
The simplest model. It assumes a constant hazard rate :
- Hazard:
- Survival:
- Mean survival time:
The exponential distribution has the memoryless property: the probability of surviving an additional time doesn't depend on how long you've already survived. This is rarely realistic for human mortality but can be reasonable for certain mechanical failures or short observation windows.
Weibull distribution
The Weibull adds a shape parameter to the exponential's scale parameter :
- Hazard:
- Survival:
The shape parameter controls the hazard trajectory:
- : decreasing hazard (e.g., infant mortality, early policy lapse)
- : constant hazard (reduces to exponential)
- : increasing hazard (e.g., wear-out failures, aging)
This flexibility makes the Weibull one of the most commonly used parametric models in actuarial work.
Gompertz distribution
The Gompertz model assumes an exponentially increasing hazard, which fits adult human mortality remarkably well:
- Hazard:
- Survival:
Here sets the initial level of mortality and controls how fast it increases with age.
The Gompertz-Makeham extension adds a constant term to capture age-independent mortality (accidents, etc.):
This is the standard model behind many actuarial life tables for ages roughly 30 to 90.
Accelerated failure time models
Accelerated failure time (AFT) models offer a different regression framework from the Cox model. Instead of modeling the hazard multiplicatively, they model the log of survival time as a linear function of covariates:
where follows a specified distribution (e.g., extreme value for Weibull, normal for log-normal, logistic for log-logistic).
Coefficients have an intuitive interpretation: is an acceleration factor. A value of 2 means the covariate doubles the expected survival time; a value of 0.5 means it halves it. AFT models are a natural alternative when the proportional hazards assumption fails.

Cox proportional hazards model
The Cox PH model is the most widely used regression model in survival analysis. It's semi-parametric: it models the effect of covariates parametrically while leaving the baseline hazard completely unspecified.
Model specification and assumptions
The Cox model specifies the hazard for an individual with covariate vector as:
where is the baseline hazard (the hazard when all covariates equal zero) and is the vector of regression coefficients.
The critical assumption is proportional hazards: the hazard ratio between any two individuals is constant over time. Mathematically, for individuals with covariates and :
The right side has no in it. If this ratio actually changes over time, the PH assumption is violated and you need to consider alternatives (stratification, time-dependent coefficients, or AFT models).
Partial likelihood estimation
Because is unspecified, you can't write down a full likelihood. Cox's key insight was the partial likelihood, which eliminates entirely.
At each event time , the partial likelihood contribution is the probability that the individual who actually failed is the one who failed, out of everyone still at risk:
where is the risk set at time . Maximizing this with respect to yields consistent, asymptotically normal estimates. The resulting values are interpreted as log-hazard ratios.
Hazard ratio interpretation
The hazard ratio (HR) for covariate is :
- : higher hazard (shorter expected survival) for a one-unit increase in
- : lower hazard (longer expected survival)
- : no effect
For example, if smoking status has , then . Smokers face a 60% higher hazard of the event at any given time compared to non-smokers, holding other covariates constant.
For continuous covariates, the HR applies per unit increase. Rescaling the covariate (e.g., age in decades vs. years) changes the coefficient but not the model.
Tied event times
When multiple events share the same recorded time, the partial likelihood needs adjustment. Three common approaches, in order of increasing accuracy and computational cost:
- Breslow approximation: Simplest; treats the risk set as unchanged across tied events. Works well when ties are few.
- Efron approximation: Adjusts the risk set to account for ties more carefully. Generally preferred as a good balance of accuracy and speed.
- Exact partial likelihood: Considers all possible orderings of the tied events. Most accurate but computationally expensive with many ties.
In actuarial data with discrete time units (e.g., monthly records), ties can be frequent, making the choice of method consequential.
Time-dependent covariates
Some covariates change over time: a policyholder's age, cumulative claim amount, or current health status. The Cox model accommodates these by allowing the linear predictor to vary:
Including time-dependent covariates relaxes the proportional hazards assumption for those variables, since the hazard ratio now changes as changes. Implementation requires restructuring the data so that each individual contributes multiple rows, one for each interval during which their covariates are constant.
Model diagnostics and assessment
Fitting a model is only half the work. You need to verify that the model's assumptions hold and that it fits the data adequately.
Proportional hazards assumption
Several tools exist to check whether the PH assumption holds:
- Log-minus-log plot: Plot for different groups. Under PH, these curves should be roughly parallel. Crossing or diverging curves suggest violation.
- Scaled Schoenfeld residuals: Plot these against time for each covariate. A non-zero slope indicates the covariate's effect changes over time.
- Grambsch-Therneau test: A formal statistical test based on the correlation between Schoenfeld residuals and time. A significant p-value for a covariate means its hazard ratio is not constant.
If PH is violated, common remedies include stratifying on the offending variable, adding an interaction with time, or switching to an AFT model.
Residual analysis
Three types of residuals are commonly used:
- Cox-Snell residuals: If the model fits well, these should follow a unit exponential distribution. Plot the Nelson-Aalen estimate of their cumulative hazard against a 45-degree line; departures indicate poor fit.
- Martingale residuals: Range from to 1. Useful for detecting the correct functional form of covariates. Plot them against each covariate and look for nonlinear patterns.
- Deviance residuals: A transformation of martingale residuals that are more symmetrically distributed around zero. Better for identifying outliers.
Goodness-of-fit tests
Three standard tests assess model fit and covariate significance:
- Likelihood ratio test: Compares nested models by examining twice the difference in log-likelihoods. Most reliable for testing whether a set of covariates improves the model.
- Wald test: Tests individual coefficients using . Quick but can be unreliable for large coefficients.
- Score test: Tests the effect of adding covariates without refitting the model. Useful for screening candidate variables.
All three are asymptotically equivalent but can differ in finite samples. The likelihood ratio test is generally preferred.
Model selection criteria
When comparing non-nested models, use information criteria:
- AIC (Akaike Information Criterion): , where is the maximized log-likelihood and is the number of parameters.
- BIC (Bayesian Information Criterion): , where is the sample size. BIC penalizes complexity more heavily than AIC.
Lower values indicate a better balance of fit and parsimony. Cross-validation (k-fold or leave-one-out) provides a more direct assessment of predictive performance on unseen data, though it's computationally heavier.
Competing risks and multistate models
Real actuarial problems rarely involve a single event type. A policyholder might die, lapse, or surrender. A borrower might default, prepay, or reach maturity. Competing risks and multistate models handle these richer event structures.

Cause-specific hazards
The cause-specific hazard for event type is:
To estimate it, treat all other event types as censored and apply standard methods (KM, Cox) to each cause separately. The overall hazard is the sum of all cause-specific hazards:
Be careful with interpretation: a cause-specific hazard ratio from a Cox model tells you about the rate of that specific event, but it doesn't directly tell you about the probability of that event occurring, because competing events also affect that probability.
Cumulative incidence functions
The cumulative incidence function (CIF) for cause gives the probability of experiencing event by time , accounting for competing risks:
where is the overall survival function (probability of no event of any type). The CIFs for all causes sum to , not to 1 individually.
The Aalen-Johansen estimator generalizes the KM estimator to this setting. Using for each cause separately (treating other causes as censored) overestimates the cumulative incidence, which is a common and consequential mistake.
Multistate Markov models
Multistate models generalize competing risks by allowing transitions between multiple states, not just from "alive" to various "dead" states. A disability insurance model might have states: Active → Disabled → Recovered → Dead, with transitions possible between several pairs.
Key features:
- Transition intensities describe the instantaneous rate of moving from state to state .
- The Markov property assumes future transitions depend only on the current state, not on history.
- Outputs include state occupation probabilities (probability of being in each state at time ), transition probabilities, and expected sojourn times (how long you stay in a state).
These models are central to disability insurance, long-term care, and pension valuation where individuals move between active, disabled, and deceased states.
Recurrent event analysis
Some events can happen more than once to the same individual: repeated hospitalizations, multiple insurance claims, or recurring disability episodes. Standard survival analysis assumes one event per person, so recurrent event methods are needed.
The core challenge is that events within the same individual are correlated. Someone who files one claim quickly is likely to file the next one quickly too. Three main approaches handle this dependence.
Intensity functions
The intensity function gives the instantaneous event rate at time , conditional on the full event history up to that point.
Two extensions of the Cox model are commonly used:
- Andersen-Gill (AG) model: Treats each event interval as a separate observation in a single Cox model. Assumes a common baseline hazard across all events. Simple but assumes events within an individual are independent given covariates.
- Prentice-Williams-Peterson (PWP) model: Stratifies the baseline hazard by event number (first event, second event, etc.), allowing the baseline risk to differ for each recurrence. More realistic when the risk profile changes after each event.
Variance-corrected models
Rather than modeling the dependence structure explicitly, variance-corrected models use the standard Cox framework but fix the standard errors.
The robust (sandwich) variance estimator clusters observations by individual and produces standard errors that are valid even when within-individual events are correlated. The point estimates (hazard ratios) stay the same as in a naive analysis, but the confidence intervals and p-values are corrected.
This approach is practical when you care primarily about covariate effects and less about the dependence structure itself.
Frailty models
Frailty models add a random effect for each individual to capture unobserved heterogeneity:
The frailty is typically assumed to follow a gamma or log-normal distribution with mean 1. Individuals with are more "frail" (higher event rates); those with are more robust.
Frailty models simultaneously account for within-individual correlation and provide individual-level risk estimates. The variance of the frailty distribution quantifies the degree of unobserved heterogeneity in the population.
Applications in actuarial science
Life insurance pricing and reserving
Mortality modeling is the foundation of life insurance. Actuaries fit parametric models (typically Gompertz-Makeham) to population or insured-lives data to construct life tables. The Cox model helps quantify how underwriting factors (smoking status, BMI, occupation) shift mortality risk. These survival estimates feed directly into premium calculations and reserve requirements.
Pension plan valuation
Pension liabilities depend heavily on how long retirees live. Multistate models capture the transitions between active employment, retirement, disability, and death. Longevity risk (the risk that people live longer than expected) is a major concern, and survival models that incorporate improving mortality trends help actuaries set appropriate funding levels.
Health insurance claims modeling
Health insurers use survival analysis to model both the incidence (when does a claim start?) and duration (how long does it last?) of claims. Recurrent event models handle policyholders with multiple hospitalizations or disability episodes. These models inform premium setting, claims reserving, and the design of benefit structures like elimination periods and maximum benefit durations.
Credit risk modeling
Survival analysis applies naturally to credit risk, where the "event" is default. The Cox PH model assesses how borrower characteristics (credit score, debt-to-income ratio) and macroeconomic conditions (unemployment rate, interest rates) affect default timing. Competing risks models distinguish between default, prepayment, and maturity. These survival-based estimates feed into expected loss calculations, credit scoring systems, and regulatory capital requirements.