Chi-Square Test of Independence
The chi-square test of independence determines whether two categorical variables are related or if they vary independently of each other. It's one of the most common hypothesis tests you'll encounter for categorical data, and it shows up everywhere from medical research to marketing surveys.
Test Statistic Calculation
The chi-square test works by comparing what you actually observed in your data to what you'd expect to see if the two variables had no relationship at all.
The test statistic formula:
- = observed frequency (the actual count in each cell of your contingency table)
- = expected frequency (the count you'd predict if the variables were independent)
How to find expected frequencies:
For each cell in the contingency table, use this formula:
This gives you the count you'd expect in that cell if the two variables had nothing to do with each other.
Degrees of freedom:
- = number of rows (categories of one variable)
- = number of columns (categories of the other variable)
For example, a 3×2 contingency table has degrees of freedom.

Interpretation of Results
The hypotheses for this test are always structured the same way:
- Null hypothesis (): The two variables are independent (no association).
- Alternative hypothesis (): The two variables are not independent (there is an association).
To make your decision, compare the calculated statistic to the critical value from the chi-square distribution table (using your degrees of freedom and significance level, typically ). You can also compare a p-value to if your calculator or software gives you one.
- If critical value (or p-value ): Reject . There is sufficient evidence that the two variables are associated.
- If critical value (or p-value ): Fail to reject . There is not enough evidence to conclude the variables are associated.
One thing to watch: rejecting tells you the variables are associated, but it does not tell you the direction or strength of that association. You'd need to look back at the contingency table to describe the pattern.

Applying the Test: Step-by-Step
Here's how to carry out a chi-square test of independence from start to finish:
-
Identify your two categorical variables. For example, you might ask whether a person's exercise frequency (none, moderate, daily) is related to their stress level (low, medium, high).
-
Organize your data into a contingency table with observed frequencies for every combination of categories.
-
Calculate expected frequencies for each cell using .
-
Compute the test statistic by finding the contribution from each cell and summing them all.
-
Find the degrees of freedom: .
-
Look up the critical value in a chi-square table using your df and significance level (usually 0.05), or use your calculator to find the p-value.
-
Make your conclusion. If the test statistic exceeds the critical value, you have evidence of an association. If not, you lack sufficient evidence to say the variables are related.
Always state your conclusion in context. Don't just say "reject ." Say something like: "There is sufficient evidence at the 0.05 significance level to conclude that exercise frequency and stress level are associated."
Statistical Inference and Assumptions
The chi-square test of independence is a form of statistical inference: you're using sample data to draw a conclusion about the broader population. It follows the standard hypothesis testing framework you've used throughout the course.
A few things worth knowing:
- The chi-square test is nonparametric, meaning it doesn't assume your data follow a normal distribution. That's one reason it works well for categorical data.
- The test does assume that expected frequencies are not too small. A common guideline is that every expected cell count should be at least 5. If some cells fall below that, your results may not be reliable.
- The chi-square distribution is always right-skewed and only takes non-negative values, so this is always a right-tail test.