Fundamentals of Probability Theory
Basic Concepts and Definitions
Probability measures the likelihood that an event will occur, expressed as a number between 0 and 1.
- 0 indicates impossibility (e.g., rolling a 7 on a standard six-sided die)
- 1 indicates certainty (e.g., drawing a card that is either red or black from a standard deck)
The probability of an event is denoted . In the classical interpretation, you calculate it by dividing the number of favorable outcomes by the total number of equally likely outcomes:
For example, rolling a fair six-sided die, the probability of getting a 3 is .
The complement of an event , written (or sometimes ), is the event that does not occur. Its probability is:
If the probability of a coin landing heads is 0.5, the probability of not getting heads is .
Rules for Combining Probabilities
Addition rule (disjunction). The probability of either or occurring is:
You subtract to avoid double-counting outcomes where both events happen. For example, the probability of drawing a heart or a king from a standard 52-card deck:
The king of hearts gets counted in both "hearts" and "kings," so you subtract it once.
Multiplication rule (conjunction). The probability of both and occurring is:
Here is the conditional probability of given that has occurred. For example, the probability of drawing two hearts in a row from a standard deck (without replacement):
After drawing one heart, only 12 hearts remain among 51 cards.
Independence. Two events and are independent if the occurrence of one does not affect the probability of the other. When events are independent, the multiplication rule simplifies to:
Flipping a coin and rolling a die are independent events, so the probability of heads and a 6 is .
Bayes' Theorem for Updating Probabilities
Bayes' Theorem Formula and Components
Bayes' theorem lets you reverse the direction of a conditional probability. If you know how likely the evidence is given a hypothesis, you can calculate how likely the hypothesis is given the evidence. The formula:
Each component has a standard name:
- โ the prior probability of hypothesis before seeing the evidence
- โ the likelihood, i.e., how probable the evidence is if is true
- โ the marginal probability of the evidence (across all hypotheses)
- โ the posterior probability of after taking the evidence into account

Worked Example: Medical Diagnosis
Suppose a disease affects 1 in 1,000 people. A test for the disease has a 99% true positive rate (sensitivity) and a 2% false positive rate. You test positive. What's the probability you actually have the disease?
- Identify the prior. , so .
- Identify the likelihood. .
- Calculate the marginal probability of a positive result. This accounts for all ways you could test positive (true positives + false positives):
- Apply Bayes' theorem:
The result is roughly 4.7%. Even with a positive test, there's only about a 1 in 21 chance you have the disease. This is counterintuitive, but it makes sense: the disease is so rare that false positives vastly outnumber true positives.
Updating Probabilities with New Evidence
The real power of Bayesian reasoning is iterative updating. Each posterior can become the prior for the next round of evidence.
- The prior represents your initial belief before new data arrives. In the example above, it was the disease prevalence (0.001).
- The likelihood quantifies how well each hypothesis predicts the observed evidence. A test with higher sensitivity produces a larger likelihood for the disease hypothesis.
- The marginal probability normalizes everything so the posterior is a valid probability. You compute it by summing across all competing hypotheses.
If the person in the example above takes a second independent test and again tests positive, you'd plug the new prior of 0.047 into Bayes' theorem with the same likelihood and false positive rate. The posterior would jump significantly higher, because now the prior is no longer tiny.
Conditional Probabilities and Inductive Reasoning
Understanding Conditional Probabilities
Conditional probability is the probability of event occurring given that event has already occurred:
For example, the probability of drawing a red card given that a face card has been drawn from a standard deck: there are 12 face cards total, 6 of which are red, so .
Conditional probabilities matter for inductive reasoning because they capture how evidence shifts the plausibility of a hypothesis. Observing changes the relevant sample space from "all outcomes" to "outcomes where is true," which can dramatically change the probability of .
This is exactly what Bayes' theorem formalizes: you start with , observe evidence , and compute . The conditional probability structure is what makes the update work.

Conditional Independence
Two events and are conditionally independent given a third event if:
This means that once you know , learning tells you nothing new about (and vice versa). Events can be conditionally independent given even if they are not independent overall.
For example, consider two different diagnostic tests for the same disease. The test results might be correlated in the general population (both are more likely to be positive for sick people). But given that you know whether the patient has the disease, the two test results might be independent of each other, since each test responds to the disease through a different mechanism. This conditional independence assumption is what justifies treating successive test results as independent updates in Bayesian reasoning.
Fallacies and Biases in Probabilistic Reasoning
Common Fallacies in Probability
Base rate neglect occurs when someone ignores the prior probability of an event and focuses only on the specific evidence. The medical test example above illustrates this perfectly: people intuitively feel that a 99%-accurate positive test means they almost certainly have the disease, but they're neglecting the base rate (only 1 in 1,000 people are affected). Bayes' theorem is the corrective here.
The conjunction fallacy is the mistaken belief that two events occurring together is more probable than either event alone. This violates a basic axiom: . The classic example comes from Tversky and Kahneman's "Linda problem": participants judged it more likely that Linda is both a bank teller and a feminist activist than that she is a bank teller, because the conjunction fit a narrative better. Narratives aren't probabilities.
The gambler's fallacy is the belief that past outcomes influence future independent events. After flipping ten heads in a row with a fair coin, many people feel tails is "due." But each flip is independent: regardless of prior results. The coin has no memory.
Cognitive Biases Affecting Probability Judgments
Confirmation bias is the tendency to seek, interpret, and remember evidence that supports your existing beliefs while discounting evidence that contradicts them. In Bayesian terms, this is like selectively choosing which evidence to update on, or inflating the likelihood for your favored hypothesis. The antidote is to actively consider: how probable is this evidence if my hypothesis is wrong?
The availability heuristic leads people to estimate probabilities based on how easily examples come to mind. Plane crashes receive heavy media coverage, so people overestimate their frequency relative to, say, car accidents. The ease of recall is not a reliable guide to actual frequency.
Both fallacies and biases point to the same lesson: human intuition about probability is unreliable. The formal machinery of probability theory and Bayes' theorem exists precisely to discipline that intuition. Recognizing where your reasoning departs from the math is a core skill in inductive logic.