Contingency Analysis

Contingency analysis is a way to study whether two categorical variables are associated in Intro to Statistics. You usually organize the data in a contingency table and check the relationship with a chi-square test.

Last updated July 2026

What is Contingency Analysis?

Contingency analysis is the process of checking whether two categorical variables are related in Intro to Statistics. Instead of measuring averages or slopes, you compare counts across categories, like whether exercise level and favorite drink seem connected, or whether major and housing choice line up in a survey.

The basic setup is a contingency table, sometimes called a two-way table. One variable goes across the rows and the other goes down the columns, and each cell shows how many observations fall into that combination. That table is the starting point for the whole analysis because it lets you see the observed frequencies, not just the labels.

From there, you ask whether the pattern in the table looks like a real association or just random variation. If the variables are independent, the distribution of one variable should look about the same across the categories of the other variable. If the table shows big differences from what independence would predict, that is evidence that the variables are associated.

In Intro to Statistics, this idea is usually formalized with the chi-square test of independence. The test compares observed counts to expected counts, which are the counts you would predict if there were no relationship. A small p-value means the observed pattern is hard to explain by chance alone, so you have evidence of dependence.

A common mistake is treating contingency analysis like correlation. Correlation is for quantitative variables, while contingency analysis is for categorical variables. Another mistake is thinking a significant result proves causation. A relationship in a contingency table only says the variables are associated, not that one causes the other.

You can also look at the size of the difference, not just the decision to reject or fail to reject independence. That is where measures like Cramer's V or the Phi Coefficient may come in, especially when you want to describe how strong the association is after the chi-square test tells you it exists.

Why Contingency Analysis matters in Intro to Statistics

Contingency analysis shows up any time Intro to Statistics asks you to make sense of survey data, grouped counts, or a two-way table. It gives you a structured way to move from raw category counts to a conclusion about whether two variables seem connected.

This matters because a lot of real datasets are categorical. Think of survey questions like political party and age group, blood type and disease status, or preferred study method and class section. Without a method like contingency analysis, you are just staring at a table of numbers and guessing at patterns.

The skill also connects directly to hypothesis testing. You do not just say, “These counts look different.” You write hypotheses about independence, compare observed and expected counts, and use the chi-square test statistic and p-value to support a claim. That turns a visual pattern into a statistical conclusion.

It also trains you to be careful with language. A table can show association, but not causation, and a small p-value does not mean the relationship is strong. Contingency analysis pushes you to separate significance from strength, which is a big part of statistical thinking.

Keep studying Intro to Statistics Unit 11

Visual cheatsheet

view gallery

Unit 11 study guide

How Contingency Analysis connects across the course

Contingency Table

A contingency table is the display you use before doing contingency analysis. It organizes the observed counts for two categorical variables into rows and columns, so you can compare category combinations directly. If the table is hard to read or the counts are misplaced, the rest of the analysis falls apart because the chi-square test depends on those observed frequencies.

Chi-Square Test of Independence

This is the main formal test used in contingency analysis. It checks whether the differences between observed counts and expected counts are bigger than you would expect if the variables were independent. In practice, contingency analysis often means building the table first, then using this test to decide whether the association is statistically significant.

Hypothesis Testing

Contingency analysis fits into hypothesis testing because you still set up a null and alternative hypothesis, choose a significance level, and interpret a p-value. The null usually says the variables are independent. The whole point is to move from a table of counts to a decision about whether the pattern is too unusual to blame on chance.

Cramer's V

Cramer's V helps describe the strength of the association after a contingency analysis finds a relationship. The chi-square test tells you whether there is evidence of dependence, but not how strong that dependence is. Cramer's V gives a standardized measure, which is useful when you want more than a yes or no answer.

Is Contingency Analysis on the Intro to Statistics exam?

A quiz or problem-set question usually gives you a two-way table and asks whether the variables are independent, associated, or significantly related. You may need to identify the null and alternative hypotheses, compute or interpret the chi-square statistic, and use the p-value to decide whether to reject independence. A lab question might also ask you to explain what the pattern in the table means in plain English.

The biggest move is to separate three ideas: the table of observed counts, the expected counts under independence, and the statistical conclusion. If the observed counts are far from expected counts, that supports an association. If you are asked about strength, you may need a measure like Cramer's V instead of only the chi-square test result.

Contingency Analysis vs Correlation

Contingency analysis and correlation are both about relationships, but they apply to different kinds of variables. Contingency analysis is for categorical variables shown in a two-way table, while correlation is for quantitative variables. If you try to use correlation on categories, or chi-square on numeric measurements without grouping, you are using the wrong tool.

Key things to remember about Contingency Analysis

Contingency analysis checks whether two categorical variables are associated by comparing the pattern of counts in a contingency table.
The chi-square test of independence is the main test used with contingency analysis in Intro to Statistics.
A small p-value suggests the observed table is unlikely if the variables were independent.
A statistically significant result does not prove causation, it only shows evidence of association.
If you want the strength of the relationship, look beyond the chi-square test and use a measure like Cramer's V or the Phi Coefficient when appropriate.

Frequently asked questions about Contingency Analysis

What is Contingency Analysis in Intro to Statistics?

Contingency analysis is a method for checking whether two categorical variables are related. You usually start with a contingency table, then use the chi-square test of independence to compare the observed counts with what you would expect if the variables were independent.

How do you do contingency analysis on a two-way table?

First, organize the data into a contingency table with counts for each category combination. Then compare the observed counts to the expected counts under independence and use the chi-square test statistic and p-value to judge whether the difference is meaningful.

Is contingency analysis the same as correlation?

No. Contingency analysis is for categorical data, while correlation is for quantitative data. They both describe relationships, but they use different calculations and answer different questions.

What does a significant contingency analysis mean?

A significant result means the data give evidence that the two categorical variables are associated rather than independent. It does not mean one variable causes the other, and it does not automatically tell you how strong the relationship is.