Contingency Tables
Contingency tables (also called two-way tables) organize counts for two categorical variables into rows and columns. They're the go-to tool for calculating joint, marginal, and conditional probabilities, and for determining whether two categorical variables are independent or associated.

Probabilities from Contingency Tables
A contingency table has one variable along the rows and another along the columns. Each cell holds the count (frequency) of observations that fall into that particular combination. The row and column totals are called marginal totals, and the bottom-right corner holds the grand total of all observations.
Here's a quick example table for gender and favorite color:
| Blue | Red | Green | Row Total | |
|---|---|---|---|---|
| Male | 40 | 25 | 15 | 80 |
| Female | 30 | 35 | 55 | 120 |
| Column Total | 70 | 60 | 70 | 200 |
To calculate the probability of any specific combination (a joint probability):
- Identify the cell frequency for the combination you care about.
- Divide that cell frequency by the grand total.
- Express this with the formula:
Using the table above, the probability of being male and preferring blue is .
Relationships in Two-Way Tables
Independence
Two variables are independent if knowing the level of one variable tells you nothing new about the other. Mathematically, independence means:
From the example table: and . If gender and color preference were independent, you'd expect . The actual joint probability is 0.20, which is noticeably different from 0.14. That gap suggests the variables are not independent.

Association
Two variables are associated if the probability of one variable changes depending on the level of the other. In the example, males appear to prefer blue at a higher rate than females do, which signals an association between gender and color preference.
To assess association, compare conditional probabilities across groups (more on this below). The bigger the difference between those conditional probabilities, the stronger the association.
Conditional and Marginal Probabilities
Marginal Probabilities
A marginal probability is the probability of a single variable's outcome, ignoring the other variable entirely. You find it by summing across an entire row or column and dividing by the grand total.
From the table: and .
These are called "marginal" because they come from the margins (totals) of the table.

Conditional Probabilities
A conditional probability is the probability of one outcome given that you already know the other variable's value. The formula is:
In practice, you can skip converting to probabilities first and just work directly with counts:
Notice the difference: 50% of males prefer blue, but only 25% of females do. That's a concrete way to see the association.
Connecting Independence to Conditional Probabilities
This is the key link between these concepts:
- If two variables are independent, the conditional probability equals the marginal probability. That is, .
- If two variables are associated, the conditional probability differs from the marginal probability. Here, while , confirming the association.
Quick independence check: Compare conditional probabilities across groups. If for all levels, the variables are independent. If those values differ, the variables are associated.
Contingency Analysis and Statistical Inference
The patterns you observe in a contingency table come from sample data, so an apparent association might just be due to random chance. Statistical tests (like the chi-square test, which you'll encounter soon) formalize this by comparing observed frequencies to the expected frequencies you'd see if the variables were truly independent. If the observed counts differ enough from the expected counts, you can conclude the association is statistically significant and not just noise in the data.