Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.1 Probability axioms and properties

1.1 Probability axioms and properties

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Probability axioms and properties form the mathematical foundation for everything in actuarial science. Before you can price insurance products, model mortality, or assess risk, you need a rigorous framework for quantifying uncertainty. This guide covers that framework, from the core axioms through conditional probability and independence.

Probability basics

Probability assigns a numerical value to how likely an event is to occur. In actuarial work, this translates directly into questions like "what's the chance a policyholder files a claim this year?" or "what's the likelihood a bond defaults?" Every model you'll build later rests on these fundamentals.

Sample space and events

The sample space (Ω\Omega) is the set of all possible outcomes of a random process. For example, rolling a standard die gives Ω={1,2,3,4,5,6}\Omega = \{1, 2, 3, 4, 5, 6\}.

An event is any subset of the sample space. It can be:

  • Simple: a single outcome, like rolling a 4
  • Compound: multiple outcomes, like rolling an even number ({2,4,6}\{2, 4, 6\})

Two special events always exist: the empty set \emptyset (no outcomes, the "impossible event") and Ω\Omega itself (all outcomes, the "certain event"). Both qualify as events in the formal framework.

Probability of an event

The probability P(A)P(A) measures the likelihood of event AA occurring, always as a value between 0 and 1. For a finite sample space with equally likely outcomes:

P(A)=number of outcomes in Atotal number of outcomes in ΩP(A) = \frac{\text{number of outcomes in } A}{\text{total number of outcomes in } \Omega}

Three approaches to assigning probabilities come up in practice:

  • Classical: based on equally likely outcomes (coin flips, dice)
  • Empirical (frequentist): based on observed relative frequencies from data (claim rates from historical records)
  • Subjective: based on expert judgment when data is limited (emerging risk assessment)

Axioms of probability

These three axioms, formalized by Kolmogorov, are the bedrock of probability theory. Every result that follows in this guide (and in the rest of the course) derives from them.

Non-negativity

P(A)0P(A) \geq 0 for any event AA

Probabilities can never be negative. A negative likelihood has no meaningful interpretation.

Probability of sample space

P(Ω)=1P(\Omega) = 1

Since Ω\Omega contains every possible outcome, something in it must happen. The total probability is 1.

Countable additivity

For any countable sequence of mutually exclusive events A1,A2,A_1, A_2, \ldots (meaning no two can occur simultaneously):

P(i=1Ai)=i=1P(Ai)P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)

This is what lets you break a complex event into simpler, non-overlapping pieces and add their probabilities. Note that this axiom applies to countably infinite unions, not just finite ones, which is what gives probability theory its analytical power.

Consequences of axioms

Everything below follows logically from the three axioms. These aren't additional assumptions; they're provable results.

Sample space and events, Finding the Probability of an Event | Mathematics for the Liberal Arts Corequisite

Probability of empty set

P()=0P(\emptyset) = 0

The impossible event has probability zero. You can prove this by writing Ω=Ω\Omega = \Omega \cup \emptyset \cup \emptyset \cup \ldots and applying countable additivity: P(Ω)=P(Ω)+P()+P()+P(\Omega) = P(\Omega) + P(\emptyset) + P(\emptyset) + \ldots, which only works if P()=0P(\emptyset) = 0.

Probability of complement

P(Ac)=1P(A)P(A^c) = 1 - P(A)

Since AA and AcA^c are mutually exclusive and AAc=ΩA \cup A^c = \Omega, additivity gives P(A)+P(Ac)=1P(A) + P(A^c) = 1. This is extremely useful in practice. When computing P(A)P(A) directly is hard, it's often easier to compute P(Ac)P(A^c) and subtract from 1. For example, "probability that at least one claim occurs" = 1P(no claims occur)1 - P(\text{no claims occur}).

Monotonicity of probability

If ABA \subseteq B, then P(A)P(B)P(A) \leq P(B)

If every outcome in AA is also in BB, then AA can't be more probable than BB. You can see this by writing B=A(BA)B = A \cup (B \setminus A), where the two pieces are disjoint, so P(B)=P(A)+P(BA)P(A)P(B) = P(A) + P(B \setminus A) \geq P(A).

Properties of probability

Inclusion-exclusion for two events

For any two events AA and BB:

P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

Why subtract P(AB)P(A \cap B)? When you add P(A)P(A) and P(B)P(B), outcomes in both AA and BB get counted twice. Subtracting the intersection corrects for that double-counting.

Example: Suppose 30% of policyholders have auto insurance, 50% have home insurance, and 10% have both. The probability a random policyholder has at least one of these is 0.30+0.500.10=0.700.30 + 0.50 - 0.10 = 0.70.

Inclusion-exclusion for multiple events

The principle generalizes to nn events A1,A2,,AnA_1, A_2, \ldots, A_n:

P(i=1nAi)=iP(Ai)i<jP(AiAj)+i<j<kP(AiAjAk)+(1)n+1P(A1A2An)P\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i} P(A_i) - \sum_{i<j} P(A_i \cap A_j) + \sum_{i<j<k} P(A_i \cap A_j \cap A_k) - \ldots + (-1)^{n+1} P(A_1 \cap A_2 \cap \ldots \cap A_n)

The pattern alternates: add single-event probabilities, subtract pairwise intersections, add triple intersections, and so on. Each term corrects the over- or under-counting from the previous step.

Bonferroni inequalities

When exact intersection probabilities are unknown or expensive to compute, Bonferroni inequalities give you bounds:

  • Upper bound: P(i=1nAi)i=1nP(Ai)P\left(\bigcup_{i=1}^{n} A_i\right) \leq \sum_{i=1}^{n} P(A_i)
  • Lower bound: P(i=1nAi)i=1nP(Ai)i<jP(AiAj)P\left(\bigcup_{i=1}^{n} A_i\right) \geq \sum_{i=1}^{n} P(A_i) - \sum_{i<j} P(A_i \cap A_j)

The upper bound is also called Boole's inequality (or the union bound). These are truncations of the inclusion-exclusion formula: stopping after an odd number of terms gives an upper bound, and stopping after an even number gives a lower bound.

Sample space and events, Tree and Venn Diagrams | Introduction to Statistics

Continuity of probability

If A1A2A3A_1 \subseteq A_2 \subseteq A_3 \subseteq \ldots is an increasing sequence of events with A=i=1AiA = \bigcup_{i=1}^{\infty} A_i, then:

limnP(An)=P(A)\lim_{n \to \infty} P(A_n) = P(A)

Similarly, for a decreasing sequence A1A2A_1 \supseteq A_2 \supseteq \ldots with A=i=1AiA = \bigcap_{i=1}^{\infty} A_i:

limnP(An)=P(A)\lim_{n \to \infty} P(A_n) = P(A)

This continuity property follows from countable additivity and is what allows you to use limiting arguments in probability. It's essential when working with sequences of events in survival analysis or ruin theory.

Conditional probability

Conditional probability captures how the likelihood of an event changes when you learn that another event has occurred. In actuarial contexts, this shows up constantly: what's the probability of a large claim given that a claim has been filed? What's the probability of death within a year given that the insured is age 65?

Definition and formula

The conditional probability of AA given BB is:

P(AB)=P(AB)P(B),where P(B)>0P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad \text{where } P(B) > 0

You're restricting your attention to the outcomes where BB occurred, then asking what fraction of those also belong to AA. The requirement P(B)>0P(B) > 0 is critical; conditioning on an impossible event is undefined.

Law of total probability

If {B1,B2,,Bn}\{B_1, B_2, \ldots, B_n\} is a partition of Ω\Omega (the BiB_i are mutually exclusive and their union is Ω\Omega), then for any event AA:

P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)

This lets you compute P(A)P(A) by breaking the problem into cases.

Example: An insurer has three risk classes: low (60% of policyholders, 2% claim rate), medium (30%, 5% claim rate), and high (10%, 15% claim rate). The overall claim probability is:

P(claim)=(0.02)(0.60)+(0.05)(0.30)+(0.15)(0.10)=0.012+0.015+0.015=0.042P(\text{claim}) = (0.02)(0.60) + (0.05)(0.30) + (0.15)(0.10) = 0.012 + 0.015 + 0.015 = 0.042

So 4.2% of all policyholders file a claim.

Bayes' theorem

Bayes' theorem reverses the direction of conditioning. Given the same partition {B1,,Bn}\{B_1, \ldots, B_n\}:

P(BiA)=P(ABi)P(Bi)j=1nP(ABj)P(Bj)P(B_i|A) = \frac{P(A|B_i) \cdot P(B_i)}{\sum_{j=1}^{n} P(A|B_j) \cdot P(B_j)}

The denominator is just P(A)P(A) from the law of total probability.

Example (continuing from above): If a policyholder filed a claim, what's the probability they're high-risk?

P(highclaim)=(0.15)(0.10)0.042=0.0150.0420.357P(\text{high}|\text{claim}) = \frac{(0.15)(0.10)}{0.042} = \frac{0.015}{0.042} \approx 0.357

Even though only 10% of policyholders are high-risk, they make up about 35.7% of those who file claims. This kind of posterior updating is central to credibility theory and Bayesian methods in actuarial work.

Independence of events

Two events are independent if knowing one occurred tells you nothing about whether the other occurred. Independence simplifies calculations dramatically, but you should always verify it rather than assume it.

Definition of independence

Events AA and BB are independent if and only if:

P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)

An equivalent condition: P(AB)=P(A)P(A|B) = P(A) (when P(B)>0P(B) > 0). The occurrence of BB doesn't change the probability of AA.

Be careful: independence is not the same as being mutually exclusive. If AA and BB are mutually exclusive and both have positive probability, they cannot be independent (knowing AA occurred tells you BB didn't).

Multiplication rule for independent events

For independent events A1,A2,,AnA_1, A_2, \ldots, A_n:

P(A1A2An)=P(A1)P(A2)P(An)P(A_1 \cap A_2 \cap \ldots \cap A_n) = P(A_1) \cdot P(A_2) \cdot \ldots \cdot P(A_n)

Example: If three independent policyholders each have a 0.03 probability of filing a claim, the probability all three file is 0.033=0.0000270.03^3 = 0.000027.

Pairwise vs mutual independence

This distinction matters and is a common exam topic.

  • Pairwise independence: Every pair of events satisfies P(AiAj)=P(Ai)P(Aj)P(A_i \cap A_j) = P(A_i) \cdot P(A_j)
  • Mutual independence: Every subset of events satisfies the multiplication rule. For three events, this means all of the following must hold:
    • P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)
    • P(AC)=P(A)P(C)P(A \cap C) = P(A) \cdot P(C)
    • P(BC)=P(B)P(C)P(B \cap C) = P(B) \cdot P(C)
    • P(ABC)=P(A)P(B)P(C)P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C)

Pairwise independence does not guarantee mutual independence. There are well-known counterexamples where three events are pairwise independent but the triple intersection condition fails. For actuarial modeling, mutual independence is typically the assumption you need.