Contingency tables organize data about two categorical variables into rows and columns, making it easy to spot patterns and calculate probabilities. Using these tables, you can determine joint, marginal, and conditional probabilities, and figure out whether two variables are actually related or independent.
Contingency Tables
Construction of contingency tables
A contingency table is a grid that summarizes how often each combination of two categorical variables occurs. One variable defines the rows, the other defines the columns, and each cell holds the count for that specific combination.
Here's a quick example. Suppose you survey 200 people about gender and favorite color:
| Blue | Red | Green | Row Total | |
|---|---|---|---|---|
| Female | 40 | 25 | 35 | 100 |
| Male | 30 | 30 | 40 | 100 |
| Column Total | 70 | 55 | 75 | 200 |
A few things to notice:
- Each cell holds the frequency (count) for one specific combination, like "40 females prefer blue"
- Marginal totals are the row and column sums along the edges of the table. They tell you the total for a single variable regardless of the other (e.g., 70 total people prefer blue, 100 total are female)
- The grand total (bottom-right corner) is the total number of observations

Probability calculations from contingency tables
You can pull three types of probability straight from a contingency table. All of them use the same basic idea: divide a count by the appropriate total.
Marginal probability is the probability of a single event, ignoring the other variable. You use the marginal totals for this.
Using the table above: . That's the probability someone prefers blue, regardless of gender.
Joint probability is the probability of two events happening together. You use the count in a specific cell.
For example: . That's the probability a randomly selected person is both female and prefers blue.
The difference matters. Marginal probability looks at one variable alone (using the totals on the margins). Joint probability looks at a specific combination of both variables (using an interior cell). Students often mix these up, so pay attention to whether the question asks about one category or two.

Conditional probabilities in contingency tables
Conditional probability is the probability of one event occurring given that another event has already occurred. It's written as , read "probability of given ."
The formula is:
To calculate it from a contingency table:
- Find the row or column that matches the "given" condition (the event after the "|")
- Use only the numbers within that row or column as your working universe
- Divide the relevant cell count by that row or column's marginal total
For example, what's the probability someone prefers red, given they are female?
Notice you're dividing by 100 (total females), not 200 (total people). The "given" condition shrinks your sample to just that subgroup.
Comparing conditional and marginal probabilities reveals whether the variables are related. Here, while . These aren't equal, which hints that gender and color preference might be associated.
Association and Independence
Two categorical variables are independent if knowing the value of one tells you nothing about the other. In a contingency table, independence means the conditional probability equals the marginal probability for every category:
If for all categories, the variables are independent. If they're not equal, there's an association between the variables.
From the example above, but . Since these differ, there's some association between gender and color preference in this sample.
To formally test whether an observed association is statistically significant (or just due to random chance), you'd use a chi-square test. That test relies on degrees of freedom, calculated as:
For a 2×3 table like the one above: . You'll use this value when looking up critical values or p-values for the chi-square test.