Fiveable

📊Honors Statistics Unit 11 Review

QR code for Honors Statistics practice questions

11.4 Test for Homogeneity

11.4 Test for Homogeneity

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Test for Homogeneity

A test for homogeneity answers a straightforward question: do two or more populations share the same distribution of a categorical variable? For example, you might ask whether the breakdown of political party preference is the same for voters under 30, 30–50, and over 50. This test uses the same chi-square machinery you've already seen, but the setup and interpretation have their own logic worth understanding clearly.

Pep mascot
more resources to help you study

Homogeneity vs. Goodness-of-Fit Tests

These two tests look similar in calculation but answer different questions. Keeping them straight matters for choosing the right approach on a problem.

Test for homogeneity:

  • Compares the distribution of a single categorical variable across two or more independent populations (e.g., men vs. women, or three different schools)
  • The question is: Are the proportions of each category the same across all populations?
  • You collect a separate sample from each population and compare them
  • Example: Surveying 200 men and 200 women to see if the proportion who smoke, vape, or use neither differs between genders

Goodness-of-fit test:

  • Examines a single population against a specific hypothesized distribution
  • The question is: Does this one sample match what we'd expect under a given model?
  • You have one sample and a theoretical distribution to compare it to
  • Example: Rolling a die 300 times and testing whether outcomes follow a uniform distribution (50 per face)

The key distinction: homogeneity compares multiple populations to each other, while goodness-of-fit compares one population to a theoretical model.

Homogeneity vs goodness-of-fit tests, Goodness-of-Fit (1 of 2) | Concepts in Statistics

Test Statistic for Homogeneity

The calculation follows the same chi-square formula used elsewhere, but the data lives in a contingency table (also called a two-way table). Here's the process step by step:

  1. Organize the data in a contingency table. Rows represent the different populations (e.g., men and women), and columns represent the categories of the variable (e.g., smoker, non-smoker).

  2. Calculate expected counts for each cell. If the null hypothesis were true (all populations have the same distribution), the expected count for row ii, column jj is:

Eij=(row i total)(column j total)grand totalE_{ij} = \frac{(\text{row } i \text{ total})(\text{column } j \text{ total})}{\text{grand total}}

This formula distributes the overall column proportions equally across all rows.

  1. Compute the chi-square test statistic by summing across every cell in the table:

χ2=(OijEij)2Eij\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where OijO_{ij} is the observed count and EijE_{ij} is the expected count for that cell. Each cell contributes to the total; larger deviations from expected counts push the statistic higher.

  1. Find the degrees of freedom:

df=(r1)(c1)df = (r - 1)(c - 1)

where rr is the number of rows (populations) and cc is the number of columns (categories). For a 2×3 table, that's (21)(31)=2(2-1)(3-1) = 2.

  1. Compare to the critical value from the chi-square distribution at your significance level (typically α=0.05\alpha = 0.05), or use the p-value from your calculator or table.

Condition check: Before running the test, verify that every expected count is at least 5. If some cells fall below 5, the chi-square approximation becomes unreliable.

Homogeneity vs goodness-of-fit tests, Goodness-of-Fit (2 of 2) | Concepts in Statistics

Interpretation of Homogeneity Results

Hypotheses:

  • H0H_0: The populations have the same distribution of the categorical variable
  • HaH_a: At least one population has a different distribution of the categorical variable

Decision rules:

  • If χ2\chi^2 exceeds the critical value (or the p-value is less than α\alpha), reject H0H_0. There is sufficient evidence that the distributions differ across populations. For instance, you might conclude that the proportion of smokers is not the same for men and women.
  • If χ2\chi^2 does not exceed the critical value (or the p-value is greater than α\alpha), fail to reject H0H_0. There is not enough evidence to conclude the distributions differ. For instance, the data may not support a claim that ice cream flavor preferences differ across age groups.

Note that rejecting H0H_0 tells you the distributions aren't all the same, but it doesn't tell you which populations differ or which categories are driving the difference. That's where additional analysis comes in.

Additional Analysis

  • Standardized residuals: These are the most immediately useful follow-up. For each cell, the standardized residual is OijEijEij\frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}. Cells with standardized residuals beyond ±2\pm 2 are the primary contributors to a significant result, helping you pinpoint where the distributions diverge.
  • Effect size (Cramér's V): Quantifies how strong the association is between the row variable and column variable, on a scale from 0 (no association) to 1 (perfect association). Useful because a large sample can produce a significant χ2\chi^2 even when the actual differences in proportions are small.
  • Post-hoc pairwise comparisons: If you have three or more populations, you can run separate homogeneity tests on pairs of populations to identify which specific groups differ. Be aware that running multiple tests inflates your Type I error rate, so a Bonferroni correction (dividing α\alpha by the number of comparisons) is typically applied.