Fiveable

๐ŸŽฒIntro to Statistics Unit 11 Review

QR code for Intro to Statistics practice questions

11.4 Test for Homogeneity

11.4 Test for Homogeneity

Written by the Fiveable Content Team โ€ข Last updated August 2025
Written by the Fiveable Content Team โ€ข Last updated August 2025
๐ŸŽฒIntro to Statistics
Unit & Topic Study Guides

Test for Homogeneity

The chi-square test for homogeneity determines whether the distribution of a categorical variable is the same across two or more populations. For example, you might ask: Do men and women have the same distribution of political party affiliation? This test compares observed data from multiple groups to what you'd expect if the groups were truly identical in their proportions.

Homogeneity vs. Goodness-of-Fit Tests

These two tests look similar on the surface, but they answer different questions.

A test for homogeneity compares the distribution of a categorical variable across multiple populations. You're sampling from each group separately and asking whether the proportions match up.

  • Null hypothesis (H0H_0): The proportions of the categorical variable are the same across all populations.
  • Alternative hypothesis (HaH_a): The proportions differ in at least one population.
  • Example: Surveying 200 men and 200 women separately, then comparing their distributions of political party preference.

A goodness-of-fit test compares a single sample to a known or expected distribution.

  • Null hypothesis (H0H_0): The sample follows the expected distribution.
  • Alternative hypothesis (HaH_a): The sample does not follow the expected distribution.
  • Example: Testing whether the blood type distribution in a hospital sample matches the known national distribution (44% O, 42% A, 10% B, 4% AB).

Quick rule: If you're comparing groups to each other, use homogeneity. If you're comparing one sample to a known standard, use goodness-of-fit.

Homogeneity vs goodness-of-fit tests, Goodness-of-Fit (2 of 2) | Concepts in Statistics

Applying the Chi-Square Test Statistic

The formula for the chi-square statistic in a homogeneity test is:

ฯ‡2=โˆ‘i=1rโˆ‘j=1c(Oijโˆ’Eij)2Eij\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

  • OijO_{ij} = observed frequency in row ii, column jj
  • EijE_{ij} = expected frequency in row ii, column jj
  • rr = number of categories (rows)
  • cc = number of populations (columns)

How to calculate expected frequencies:

For each cell, use this formula:

Eij=(rowย iย total)ร—(columnย jย total)grandย totalE_{ij} = \frac{(\text{row } i \text{ total}) \times (\text{column } j \text{ total})}{\text{grand total}}

This gives you the count you'd expect in that cell if the distributions were identical across all groups.

Steps for conducting the test:

  1. Set up your hypotheses (H0H_0: distributions are the same; HaH_a: at least one differs).

  2. Organize your observed data into a contingency table with rows for categories and columns for populations.

  3. Calculate row totals, column totals, and the grand total.

  4. Compute the expected frequency for every cell using the formula above.

  5. For each cell, calculate (Oijโˆ’Eij)2Eij\frac{(O_{ij} - E_{ij})^2}{E_{ij}}.

  6. Sum all those values to get your ฯ‡2\chi^2 test statistic.

  7. Find the degrees of freedom: (rโˆ’1)(cโˆ’1)(r - 1)(c - 1).

  8. Compare your ฯ‡2\chi^2 value to the critical value at your chosen significance level (or use the p-value).

    • If ฯ‡2\chi^2 exceeds the critical value (or p-value < ฮฑ\alpha), reject H0H_0.
    • Otherwise, fail to reject H0H_0.
Homogeneity vs goodness-of-fit tests, Goodness-of-fit calculations โ€” IntroQG 2019 documentation

Interpreting Homogeneity Test Results

  • Rejecting H0H_0 means there's statistically significant evidence that the distribution of the categorical variable differs across at least one of the populations.
  • Failing to reject H0H_0 means you don't have enough evidence to say the distributions differ. This does not prove they're the same.

When you do reject H0H_0, look at which cells had the largest differences between observed and expected values. These cells drove the significant result and tell you where the distributions diverge.

Assumptions to check before running the test:

  • Each expected frequency should be at least 5. If some cells fall below this threshold, your chi-square approximation may not be reliable.
  • Observations should be independent of one another.
  • Data should come from random samples drawn separately from each population.

If the expected frequency condition isn't met, consider combining categories or, for 2ร—2 tables, using Fisher's exact test as an alternative.

Additional Analysis and Measures

If the overall test is significant, you may want to dig deeper:

  • Standardized residuals for individual cells help pinpoint which specific categories and populations are driving the difference. A standardized residual with an absolute value greater than about 2 suggests that cell is a notable contributor.
  • Effect size measures like Cramรฉr's V quantify the strength of the association (values range from 0 to 1, with higher values indicating a stronger relationship).
  • Post-hoc pairwise comparisons can be conducted between specific populations to identify exactly which groups differ, though you'll need to adjust for multiple comparisons.