Fiveable

📊Honors Statistics Unit 3 Review

QR code for Honors Statistics practice questions

3.4 Contingency Tables

3.4 Contingency Tables

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Contingency Tables

Contingency tables (also called two-way tables) organize counts for two categorical variables into rows and columns. They're the go-to tool for calculating joint, marginal, and conditional probabilities, and for determining whether two categorical variables are independent or associated.

Pep mascot
more resources to help you study

Probabilities from Contingency Tables

A contingency table has one variable along the rows and another along the columns. Each cell holds the count (frequency) of observations that fall into that particular combination. The row and column totals are called marginal totals, and the bottom-right corner holds the grand total of all observations.

Here's a quick example table for gender and favorite color:

BlueRedGreenRow Total
Male40251580
Female303555120
Column Total706070200

To calculate the probability of any specific combination (a joint probability):

  1. Identify the cell frequency for the combination you care about.
  2. Divide that cell frequency by the grand total.
  3. Express this with the formula: P(AB)=frequency(AB)grand totalP(A \cap B) = \frac{\text{frequency}(A \cap B)}{\text{grand total}}

Using the table above, the probability of being male and preferring blue is P(MaleBlue)=40200=0.20P(\text{Male} \cap \text{Blue}) = \frac{40}{200} = 0.20.

Relationships in Two-Way Tables

Independence

Two variables are independent if knowing the level of one variable tells you nothing new about the other. Mathematically, independence means:

P(AB)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)

From the example table: P(Male)=80200=0.40P(\text{Male}) = \frac{80}{200} = 0.40 and P(Blue)=70200=0.35P(\text{Blue}) = \frac{70}{200} = 0.35. If gender and color preference were independent, you'd expect P(MaleBlue)=0.40×0.35=0.14P(\text{Male} \cap \text{Blue}) = 0.40 \times 0.35 = 0.14. The actual joint probability is 0.20, which is noticeably different from 0.14. That gap suggests the variables are not independent.

Probabilities from contingency tables, How to plot a contingency table (heatmap) in python using seaborn and matplotlib

Association

Two variables are associated if the probability of one variable changes depending on the level of the other. In the example, males appear to prefer blue at a higher rate than females do, which signals an association between gender and color preference.

To assess association, compare conditional probabilities across groups (more on this below). The bigger the difference between those conditional probabilities, the stronger the association.

Conditional and Marginal Probabilities

Marginal Probabilities

A marginal probability is the probability of a single variable's outcome, ignoring the other variable entirely. You find it by summing across an entire row or column and dividing by the grand total.

P(A)=row total for Agrand totalP(A) = \frac{\text{row total for } A}{\text{grand total}}

P(B)=column total for Bgrand totalP(B) = \frac{\text{column total for } B}{\text{grand total}}

From the table: P(Female)=120200=0.60P(\text{Female}) = \frac{120}{200} = 0.60 and P(Green)=70200=0.35P(\text{Green}) = \frac{70}{200} = 0.35.

These are called "marginal" because they come from the margins (totals) of the table.

Probabilities from contingency tables, Two-Way Tables (1 of 5) | Concepts in Statistics

Conditional Probabilities

A conditional probability is the probability of one outcome given that you already know the other variable's value. The formula is:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

In practice, you can skip converting to probabilities first and just work directly with counts:

P(BlueMale)=4080=0.50P(\text{Blue}|\text{Male}) = \frac{40}{80} = 0.50

P(BlueFemale)=30120=0.25P(\text{Blue}|\text{Female}) = \frac{30}{120} = 0.25

Notice the difference: 50% of males prefer blue, but only 25% of females do. That's a concrete way to see the association.

Connecting Independence to Conditional Probabilities

This is the key link between these concepts:

  • If two variables are independent, the conditional probability equals the marginal probability. That is, P(BlueMale)=P(Blue)P(\text{Blue}|\text{Male}) = P(\text{Blue}).
  • If two variables are associated, the conditional probability differs from the marginal probability. Here, P(BlueMale)=0.50P(\text{Blue}|\text{Male}) = 0.50 while P(Blue)=0.35P(\text{Blue}) = 0.35, confirming the association.

Quick independence check: Compare conditional probabilities across groups. If P(BA1)=P(BA2)P(B|A_1) = P(B|A_2) for all levels, the variables are independent. If those values differ, the variables are associated.

Contingency Analysis and Statistical Inference

The patterns you observe in a contingency table come from sample data, so an apparent association might just be due to random chance. Statistical tests (like the chi-square test, which you'll encounter soon) formalize this by comparing observed frequencies to the expected frequencies you'd see if the variables were truly independent. If the observed counts differ enough from the expected counts, you can conclude the association is statistically significant and not just noise in the data.