Survival and hazard functions are the core tools actuaries use to model how long people live and how risk changes over time. The survival function tells you the probability someone is still alive at time , while the hazard function captures the instantaneous risk of death at that moment. Together, they form the mathematical backbone for building mortality tables, pricing life insurance, and valuing annuities.
Definition of survival function
The survival function gives the probability that an individual survives beyond time . If is the random variable representing time of death (or failure), then:
You can think of it as answering the question: "What's the chance this person is still alive after years?"
Relationship to the distribution function
The survival function is simply the complement of the cumulative distribution function :
where is the probability of death occurring at or before time . The probability density function connects to both through differentiation:
This means the density function equals the negative derivative of the survival function. Knowing any one of , , or lets you derive the other two.
Properties of survival function
Three properties define a valid survival function. If any of these fail, you don't have a legitimate model.
Non-increasing
never increases over time. For any , we have . This makes intuitive sense: the probability of surviving past 80 years can't be higher than the probability of surviving past 70 years.
Right-continuous
The survival function is right-continuous at every point:
This technical property ensures the function is well-defined, particularly when dealing with discrete jumps in the distribution (such as in empirical survival data).
Boundary conditions
- At time zero: . Everyone is alive at the start.
- As time goes to infinity: . Eventually, everyone dies.
These two conditions anchor the survival function between 1 and 0.
Definition of hazard function
The hazard function measures the instantaneous rate of death at time , given survival up to that point. Unlike , which gives a probability, is a rate and can exceed 1.
Instantaneous failure rate
Formally, the hazard function is defined as:
In words: take a tiny interval starting at , find the conditional probability of dying in that interval (given you've survived to ), and divide by the interval width. The limit gives you the instantaneous failure rate.
An equivalent and often more useful formula is:
This ratio of the density to the survival function is frequently the easiest way to compute in practice.
Relationship to survival function
The hazard and survival functions are linked by:
Integrating both sides from 0 to gives the cumulative hazard function:
And inverting that relationship recovers the survival function:
These relationships are central. If you know any one of , , , or , you can derive all the others.
Properties of hazard function
_actuarial_mathematics_longevity_risk_insurance_pricing_valuation_models%22-8eEDj.png)
Non-negativity
for all . Since and both the density and survival function are non-negative, the hazard rate must be too.
Cumulative hazard function
The cumulative hazard accumulates risk over time. It's non-decreasing (risk can only pile up, never reverse) and satisfies:
The second condition follows from the requirement that as .
Relationships between functions
Here's a summary of how to move between the key functions. Knowing one determines all the others:
| Starting from | To get | To get | To get |
|---|---|---|---|
| — | |||
| — | |||
| — | |||
| The key derivation steps: |
- Hazard from survival: Differentiate with respect to .
- Survival from hazard: Integrate from 0 to , negate, and exponentiate.
- Density from survival: Take the negative derivative of .
Common survival distributions
Exponential distribution
The simplest survival model. It has a constant hazard rate:
- Hazard: for all , where
- Survival:
- Mean lifetime:
The exponential distribution has the memoryless property: the probability of surviving an additional years doesn't depend on how long you've already survived. This makes it unrealistic for human mortality (older people clearly face higher risk), but it's useful as a baseline model and in contexts where aging effects are negligible.
Weibull distribution
The Weibull generalizes the exponential by allowing the hazard to change over time:
- Hazard:
- Survival:
where is the scale parameter and is the shape parameter. The shape parameter controls the hazard behavior:
- : decreasing hazard (e.g., infant mortality, early component failure)
- : constant hazard (reduces to exponential)
- : increasing hazard (e.g., wear-out, aging)
Gompertz-Makeham distribution
This is the go-to model for adult human mortality. The hazard function is:
where represents age-independent (accidental) mortality, is the baseline level of age-dependent mortality, and controls how fast mortality increases with age. The corresponding survival function is:
When , this reduces to the pure Gompertz model. The exponential growth term captures the empirical observation that human mortality roughly doubles every 7-8 years in adulthood.
Estimating survival functions
Kaplan-Meier estimator
The Kaplan-Meier (product-limit) estimator is the standard nonparametric method for estimating from observed data:
where:
- = observed failure times (in order)
- = number of deaths at time
- = number of individuals still at risk just before
The estimator produces a step function that drops at each observed death time. Its major strength is that it handles censored data (individuals who leave the study or are still alive at the end) by removing them from the risk set at the time of censoring rather than treating them as deaths.
_actuarial_mathematics_longevity_risk_insurance_pricing_valuation_models%22-Km_plot.jpg)
Nelson-Aalen estimator
The Nelson-Aalen estimator targets the cumulative hazard function instead:
You can then estimate the survival function as . For large samples, the Kaplan-Meier and Nelson-Aalen-based survival estimates are very similar, but the Nelson-Aalen estimator can be more stable in small samples.
Confidence intervals
Confidence intervals quantify uncertainty in the estimated survival probabilities. Greenwood's formula estimates the variance of the Kaplan-Meier estimator:
A common refinement is the log-log transformation, which constructs intervals on the scale of and then transforms back. This approach keeps the confidence limits within the valid range of [0, 1].
Comparing survival functions
Log-rank test
The log-rank test is the most widely used nonparametric test for comparing survival curves between two or more groups. It tests the null hypothesis that all groups have identical survival functions.
At each observed death time, the test compares the observed number of deaths in each group to the expected number (calculated assuming no difference between groups). The test statistic follows a chi-square distribution with degrees of freedom equal to the number of groups minus one. The log-rank test gives equal weight to all time points, making it most powerful when hazard ratios are roughly constant over time.
Wilcoxon (Breslow) test
The Wilcoxon test works like the log-rank test but weights each time point by the number of individuals at risk . This gives more weight to earlier time points, when the risk set is larger. Use the Wilcoxon test when you believe differences between groups are more pronounced early on. Its test statistic also follows a chi-square distribution under the null hypothesis.
Cox proportional hazards model
The Cox model is a semi-parametric regression approach that relates covariates to the hazard function:
where is an unspecified baseline hazard and is the vector of regression coefficients. The key assumption is proportional hazards: the ratio of hazard functions for any two individuals is constant over time.
The exponentiated coefficients are interpreted as hazard ratios. For example, means a one-unit increase in covariate is associated with a 50% increase in the hazard of death. The Cox model is powerful because it estimates covariate effects without requiring you to specify the shape of the baseline hazard.
Applications in actuarial science
Life insurance pricing
Actuaries build mortality tables from estimated survival functions, then use those tables to price life insurance. The premium for a policy reflects the expected present value of the death benefit, weighted by the probability of death in each future year and discounted at an appropriate interest rate. More refined survival models (incorporating age, sex, health status, and smoking status) lead to more accurate risk classification and fairer premiums.
Annuity valuation
An annuity pays a stream of income, often until the annuitant dies. Valuing an annuity requires estimating how long the annuitant will survive. The expected present value of an annuity equals the sum of each future payment multiplied by the probability of the annuitant being alive at that payment date, discounted back to the present. Longer expected survival means higher annuity values, which is why annuity pricing is so sensitive to the choice of survival model.
Pension plan funding
Pension plans promise future retirement benefits, and actuaries must ensure there are enough assets to cover those promises. Survival functions project how many retirees will be alive to collect benefits in each future year. Underestimating longevity leads to underfunding; overestimating it leads to unnecessarily high contributions. Regular actuarial valuations reassess mortality assumptions and adjust contribution rates to keep the plan solvent.