Definition of conditional probability
Conditional probability measures the likelihood of an event occurring given that another event has already happened. Instead of asking "what's the chance of A?", you're asking "what's the chance of A now that I know B happened?" This distinction matters because new information often changes the odds.
Notation for conditional probability
The notation reads as "the probability of A given B." That vertical bar means "given that" or "conditional on."
The formula:
where is the joint probability of both A and B occurring, and is the probability of the event you already know happened. Note that must be greater than zero (you can't condition on an impossible event).
Quick example: Suppose 30% of students play sports, 20% play sports and are on the honor roll, and you want to know the probability a student is on the honor roll given they play sports.
So about 66.7% of student-athletes are on the honor roll.
Difference from joint probability
These two get confused a lot, so keep them straight:
- Joint probability : the chance that both A and B happen together
- Conditional probability : the chance that A happens assuming B already did
They're related by this rearrangement of the conditional probability formula:
Joint probability looks at the full sample space. Conditional probability shrinks the sample space down to only the cases where B occurred, then asks how often A shows up within that smaller group.
Fundamental concepts
Dependence vs. independence
Two events are independent if knowing one happened tells you nothing new about the other. Mathematically, for independent events:
If these equalities don't hold, the events are dependent, meaning one event's occurrence changes the probability of the other.
For example, drawing two cards from a deck without replacement creates dependent events: the first draw changes what's left in the deck. Drawing with replacement keeps them independent.
Multiplication rule of probability
The multiplication rule lets you find the probability of multiple events all happening:
This generalizes to chains of events:
When events are independent, it simplifies because the conditional probabilities just become regular probabilities:
Bayes' theorem
Bayes' theorem lets you "flip" a conditional probability. If you know but need , Bayes' theorem gets you there.
Formula and components
Each piece has a name:
- = the prior: your initial belief about A before seeing any new evidence
- = the likelihood: how probable the evidence B is if A were true
- = the marginal likelihood: the overall probability of observing B (this acts as a normalizing constant)
- = the posterior: your updated belief about A after accounting for B
Applications of Bayes' theorem
Medical diagnosis is the classic example. Suppose a disease affects 1% of the population, and a test is 95% accurate (both for true positives and true negatives). If you test positive, what's the actual probability you have the disease?
- Prior:
- Likelihood:
- Marginal likelihood:
- Posterior:
Only about 16.1%. That surprises most people, and it's exactly why understanding conditional probability matters.
Other applications include spam filtering (is this email spam given the words it contains?), forensic science (probability of guilt given DNA evidence), and machine learning classifiers.
Law of total probability
The law of total probability lets you calculate when you don't know it directly but do know how A behaves under different scenarios.
Formula and explanation
If are mutually exclusive events that cover all possibilities (they're exhaustive), then:
You're breaking A into pieces based on which scenario occurs, calculating the probability of A in each scenario, and adding them up weighted by how likely each scenario is.
This is exactly what we used in the medical diagnosis example above to find .
_formula_visual_representation_joint_probability_events_given_that%22-hqdefault.jpg)
Connection to decision trees
Decision trees (or probability trees) are a visual way to apply the law of total probability:
- Each branch from a node represents one of the mutually exclusive scenarios
- You multiply probabilities along a path to get joint probabilities
- You add across paths that lead to the same final outcome to get the marginal probability
These trees are especially helpful when a problem has multiple stages and you need to track how probabilities combine.
Conditional probability in the real world
Medical diagnosis examples
Medical testing is where conditional probability shows up most vividly. Doctors routinely deal with questions like:
- Given a positive screening result, what's the actual chance the patient has the disease? (This depends heavily on how common the disease is.)
- How does the probability of a diagnosis change as new test results and symptoms come in?
- Is a treatment effective for patients with specific characteristics?
The key insight: a "95% accurate" test does not mean a positive result gives you a 95% chance of being sick. The base rate of the disease matters enormously.
Legal applications
In courtrooms, conditional probability helps evaluate evidence:
- How strong is DNA evidence? (What's the probability of a match given innocence vs. guilt?)
- How should multiple independent pieces of evidence be combined?
- How reliable is eyewitness testimony given known error rates?
Getting these calculations wrong has real consequences, which is why the fallacies in the next section are so important.
Common misconceptions
Base rate fallacy
The base rate fallacy happens when you ignore or underweight the prior probability (base rate) of an event. In the medical example above, people hear "95% accurate test" and jump to thinking a positive result means 95% chance of disease. They forget that the disease only affects 1% of people, which dramatically lowers the posterior probability.
To avoid this: always ask "how common is this event in the first place?" before interpreting conditional evidence.
Prosecutor's fallacy
This fallacy confuses two very different probabilities:
- : the chance of seeing this evidence if the person is innocent
- : the chance the person is innocent given the evidence
A prosecutor might say "there's only a 1-in-a-million chance an innocent person would match this DNA," implying the defendant is almost certainly guilty. But that ignores the base rate: in a city of millions, multiple innocent people could match. Bayes' theorem is the proper way to handle this reasoning.
Calculating conditional probabilities
Using Venn diagrams
Venn diagrams work well for simple problems with two or three events. Overlapping circles represent events, and the overlap region represents .
To find from a Venn diagram:
- Identify the circle for event B (this is your new, restricted sample space)
- Look at the overlap region where A and B intersect
- Divide the overlap area by the total area of B
This visual approach helps build intuition for why conditioning shrinks the sample space.
Two-way tables for calculations
Two-way tables (also called contingency tables) organize data by two categorical variables. They're one of the most practical tools for conditional probability.
| Disease | No Disease | Total | |
|---|---|---|---|
| Test Positive | 95 | 495 | 590 |
| Test Negative | 5 | 9,405 | 9,410 |
| Total | 100 | 9,900 | 10,000 |
To find : look at the "Test Positive" row only, then divide the Disease cell by the row total: . This matches the Bayes' theorem calculation from earlier.
Marginal probabilities come from the "Total" row and column. Conditional probabilities come from restricting your attention to a single row or column.
_formula_visual_representation_joint_probability_events_given_that%22-300px-Multivariate_Gaussian.png)
Conditional vs. marginal probability
Differences and similarities
- Marginal probability : the overall probability of A, considering all possibilities
- Conditional probability : the probability of A within the restricted world where B has occurred
You can always recover marginal probabilities from conditional ones using the law of total probability. And if A and B are independent, the conditional probability equals the marginal probability.
When to use each
Use marginal probabilities when you have no additional information, or when events are independent and extra information doesn't help.
Use conditional probabilities when you have relevant information that changes the likelihood of the event you care about. In practice, this is most situations: you almost always know something that narrows things down.
Conditional independence
Definition and properties
Events A and B are conditionally independent given C if:
This means once you know C, learning B gives you no extra information about A. Equivalently:
An important subtlety: conditional independence given C does not mean A and B are independent overall (marginally). And the reverse is also true: marginally independent events can become dependent once you condition on something.
Examples in statistics
- Naive Bayes classifiers assume that all features are conditionally independent given the class label. This simplification often works surprisingly well in practice.
- Markov chains assume the future state depends only on the current state, not on the history of past states. This is a form of conditional independence.
- In medical studies, two symptoms might be correlated overall but conditionally independent once you account for the underlying disease.
Probability trees
Construction and interpretation
Probability trees represent multi-step random processes visually.
To build one:
- Start with a root node representing the initial situation
- Draw branches for each possible outcome of the first event, labeling each with its probability
- From each of those endpoints, draw branches for the next event's outcomes with their conditional probabilities
- Continue until all stages are represented
- Check that branches from any single node sum to 1
The leaf nodes (endpoints) represent complete sequences of outcomes.
Solving multi-step problems
Once the tree is built:
- Find a joint probability: multiply all the probabilities along a single path from root to leaf
- Find a marginal probability: identify all paths that lead to the outcome you care about, calculate each path's joint probability, and add them together
- Find a conditional probability: use the joint and marginal results in the formula
Trees keep you organized when problems have multiple dependent stages, and they make it harder to accidentally skip a step.
Conditional probability in machine learning
Naive Bayes classifier
The Naive Bayes classifier predicts which class a data point belongs to using Bayes' theorem. For features and class :
The "naive" part is the assumption that features are conditionally independent given the class. This is rarely true in reality, but the classifier still performs well for tasks like spam filtering, sentiment analysis, and document classification.
Hidden Markov models
Hidden Markov models (HMMs) deal with sequences where the underlying states aren't directly observable. They use conditional probabilities in two ways:
- Transition probabilities: the chance of moving from one hidden state to another
- Emission probabilities: the chance of observing a particular output given the current hidden state
HMMs are used in speech recognition, gene sequence analysis, and natural language processing. Algorithms like the Viterbi algorithm use these conditional probabilities to find the most likely sequence of hidden states behind an observed sequence of data.