Test for Homogeneity
The chi-square test for homogeneity determines whether the distribution of a categorical variable is the same across two or more populations. For example, you might ask: Do men and women have the same distribution of political party affiliation? This test compares observed data from multiple groups to what you'd expect if the groups were truly identical in their proportions.
Homogeneity vs. Goodness-of-Fit Tests
These two tests look similar on the surface, but they answer different questions.
A test for homogeneity compares the distribution of a categorical variable across multiple populations. You're sampling from each group separately and asking whether the proportions match up.
- Null hypothesis (): The proportions of the categorical variable are the same across all populations.
- Alternative hypothesis (): The proportions differ in at least one population.
- Example: Surveying 200 men and 200 women separately, then comparing their distributions of political party preference.
A goodness-of-fit test compares a single sample to a known or expected distribution.
- Null hypothesis (): The sample follows the expected distribution.
- Alternative hypothesis (): The sample does not follow the expected distribution.
- Example: Testing whether the blood type distribution in a hospital sample matches the known national distribution (44% O, 42% A, 10% B, 4% AB).
Quick rule: If you're comparing groups to each other, use homogeneity. If you're comparing one sample to a known standard, use goodness-of-fit.

Applying the Chi-Square Test Statistic
The formula for the chi-square statistic in a homogeneity test is:
- = observed frequency in row , column
- = expected frequency in row , column
- = number of categories (rows)
- = number of populations (columns)
How to calculate expected frequencies:
For each cell, use this formula:
This gives you the count you'd expect in that cell if the distributions were identical across all groups.
Steps for conducting the test:
-
Set up your hypotheses (: distributions are the same; : at least one differs).
-
Organize your observed data into a contingency table with rows for categories and columns for populations.
-
Calculate row totals, column totals, and the grand total.
-
Compute the expected frequency for every cell using the formula above.
-
For each cell, calculate .
-
Sum all those values to get your test statistic.
-
Find the degrees of freedom: .
-
Compare your value to the critical value at your chosen significance level (or use the p-value).
- If exceeds the critical value (or p-value < ), reject .
- Otherwise, fail to reject .

Interpreting Homogeneity Test Results
- Rejecting means there's statistically significant evidence that the distribution of the categorical variable differs across at least one of the populations.
- Failing to reject means you don't have enough evidence to say the distributions differ. This does not prove they're the same.
When you do reject , look at which cells had the largest differences between observed and expected values. These cells drove the significant result and tell you where the distributions diverge.
Assumptions to check before running the test:
- Each expected frequency should be at least 5. If some cells fall below this threshold, your chi-square approximation may not be reliable.
- Observations should be independent of one another.
- Data should come from random samples drawn separately from each population.
If the expected frequency condition isn't met, consider combining categories or, for 2ร2 tables, using Fisher's exact test as an alternative.
Additional Analysis and Measures
If the overall test is significant, you may want to dig deeper:
- Standardized residuals for individual cells help pinpoint which specific categories and populations are driving the difference. A standardized residual with an absolute value greater than about 2 suggests that cell is a notable contributor.
- Effect size measures like Cramรฉr's V quantify the strength of the association (values range from 0 to 1, with higher values indicating a stronger relationship).
- Post-hoc pairwise comparisons can be conducted between specific populations to identify exactly which groups differ, though you'll need to adjust for multiple comparisons.