Contingency tables organize data about two categorical variables into rows and columns, making it easy to spot patterns and calculate probabilities. Using these tables, you can determine joint, marginal, and conditional probabilities, and figure out whether two variables are actually related or independent.

Contingency Tables

Construction of contingency tables

A contingency table is a grid that summarizes how often each combination of two categorical variables occurs. One variable defines the rows, the other defines the columns, and each cell holds the count for that specific combination.

Here's a quick example. Suppose you survey 200 people about gender and favorite color:

	Blue	Red	Green	Row Total
Female	40	25	35	100
Male	30	30	40	100
Column Total	70	55	75	200

A few things to notice:

Each cell holds the frequency (count) for one specific combination, like "40 females prefer blue"
Marginal totals are the row and column sums along the edges of the table. They tell you the total for a single variable regardless of the other (e.g., 70 total people prefer blue, 100 total are female)
The grand total (bottom-right corner) is the total number of observations

Construction of contingency tables, 3.2 Visualizing Probabilities – Significant Statistics

Probability calculations from contingency tables

You can pull three types of probability straight from a contingency table. All of them use the same basic idea: divide a count by the appropriate total.

Marginal probability is the probability of a single event, ignoring the other variable. You use the marginal totals for this.

$P(A) = \frac{n(A)}{n}$

Using the table above: $P(\text{Blue}) = \frac{70}{200} = 0.35$ . That's the probability someone prefers blue, regardless of gender.

Joint probability is the probability of two events happening together. You use the count in a specific cell.

$P(A \cap B) = \frac{n(A \cap B)}{n}$

For example: $P(\text{Female and Blue}) = \frac{40}{200} = 0.20$ . That's the probability a randomly selected person is both female and prefers blue.

The difference matters. Marginal probability looks at one variable alone (using the totals on the margins). Joint probability looks at a specific combination of both variables (using an interior cell). Students often mix these up, so pay attention to whether the question asks about one category or two.

Construction of contingency tables, Fourfold and Contingency Tables - WikiLectures

Conditional probabilities in contingency tables

Conditional probability is the probability of one event occurring given that another event has already occurred. It's written as $P(A|B)$ , read "probability of $A$ given $B$ ."

The formula is:

$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{n(A \cap B)}{n(B)}$

To calculate it from a contingency table:

Find the row or column that matches the "given" condition (the event after the "|")
Use only the numbers within that row or column as your working universe
Divide the relevant cell count by that row or column's marginal total

For example, what's the probability someone prefers red, given they are female?

$P(\text{Red} | \text{Female}) = \frac{n(\text{Red and Female})}{n(\text{Female})} = \frac{25}{100} = 0.25$

Notice you're dividing by 100 (total females), not 200 (total people). The "given" condition shrinks your sample to just that subgroup.

Comparing conditional and marginal probabilities reveals whether the variables are related. Here, $P(\text{Red} | \text{Female}) = 0.25$ while $P(\text{Red}) = \frac{55}{200} = 0.275$ . These aren't equal, which hints that gender and color preference might be associated.

Association and Independence

Two categorical variables are independent if knowing the value of one tells you nothing about the other. In a contingency table, independence means the conditional probability equals the marginal probability for every category:

If $P(A|B) = P(A)$ for all categories, the variables are independent. If they're not equal, there's an association between the variables.

From the example above, $P(\text{Red} | \text{Female}) = 0.25$ but $P(\text{Red}) = 0.275$ . Since these differ, there's some association between gender and color preference in this sample.

To formally test whether an observed association is statistically significant (or just due to random chance), you'd use a chi-square test. That test relies on degrees of freedom, calculated as:

$df = (\text{number of rows} - 1) \times (\text{number of columns} - 1)$

For a 2×3 table like the one above: $df = (2-1)(3-1) = 2$ . You'll use this value when looking up critical values or p-values for the chi-square test.