Fiveable

📊AP Statistics Unit 8 Review

QR code for AP Statistics practice questions

8.5 Setting Up a Chi-Square Test for Homogeneity or Independence

8.5 Setting Up a Chi-Square Test for Homogeneity or Independence

Written by the Fiveable Content Team • Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examWritten by the Fiveable Content Team • Last updated June 2026
📊AP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

A chi-square test for independence checks whether two categorical variables are associated within one population, while a chi-square test for homogeneity compares the distribution of one categorical variable across two or more populations or treatments. Setting one up means picking the right test, writing hypotheses in context, and verifying that your data came from the right kind of sampling and that all expected counts are large enough.

Why This Matters for the AP Statistics Exam

Unit 8 is where you handle two-way tables of categorical data, and the setup step is half the battle. Free-response questions in this area often ask you to name the test, state hypotheses, and check conditions before any calculation happens, so getting the setup right keeps the rest of your work on track. Multiple-choice questions also test whether you can tell independence apart from homogeneity based on how the data were collected.

This topic connects directly to the earlier two-way table work on expected counts and leads into actually carrying out the test in the next topic. The same problem-solving structure you used for proportion and mean tests still applies: identify the procedure, write hypotheses, verify conditions, then compute and conclude.

Key Takeaways

  • Use a test for independence when you have one sample from one population and you measure two categorical variables on each individual.
  • Use a test for homogeneity when you compare the distribution of one categorical variable across two or more separate populations or treatment groups.
  • Write hypotheses in context, naming the variables or populations from the question instead of using generic wording.
  • The independence test needs a simple random sample; the homogeneity test needs a stratified random sample or a randomized experiment.
  • Check the 10% condition when sampling without replacement, and confirm that all expected counts are greater than 5.
  • Conditions for chi-square tests use expected counts, not observed counts, for the large counts check.

Which Test to Run?

When you have categorical data in a two-way table, your first job is to decide between a test for independence and a test for homogeneity. The difference comes from how the data were collected.

  • A chi-square test for independence fits when you take one sample from a single population and record two categorical variables on each individual. You are asking whether those two variables are associated.
  • A chi-square test for homogeneity fits when you have two or more separate samples (or treatment groups) and you want to compare the distribution of one categorical variable across them.

A quick check: if every person in your data came from one group and you sorted them by two variables, think independence. If you started with separate groups and want to compare them, think homogeneity.

Writing Hypotheses

Once you know which test is appropriate, write your hypotheses. For either test, include context by naming the actual variables or populations from the problem instead of using generic words.

Homogeneity Templates

  • H0: There is no difference in distributions of a categorical variable across populations or treatments.
  • Ha: There is a difference in distributions of a categorical variable across populations or treatments.

Independence Templates

  • H0: There is no association between two categorical variables in a given population, or the two categorical variables are independent.
  • Ha: Two categorical variables in a population are associated or dependent.

Example: Independence

Suppose you want to see whether a student's favorite sport is associated with their letter grade in an AP Statistics class. You take a random sample of 100 students from your school's AP Statistics class and record each student's favorite sport (football, basketball, or baseball) along with their letter grade.

Because you have one population and two variables measured on each student, this is a test for independence.

  • H0: There is no association between sports preference and letter grade in AP Statistics for students at XYZ High School.
  • Ha: There is an association between sports preference and letter grade in AP Statistics for students at XYZ High School.

Example: Homogeneity

Suppose you want to compare how sports preference is distributed among AP Statistics students versus AP Calculus students. You take a random sample of 100 Stats students and 100 Calculus students and record each student's preference among football, baseball, and basketball.

Because you have two separate populations and want to compare their distributions, this is a test for homogeneity.

  • H0: There is no difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
  • Ha: There is a difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.

A test for homogeneity also fits a randomized experiment, since random assignment creates the separate groups you are comparing. For example, you might compare individuals receiving a new drug treatment against individuals receiving a placebo.

Conditions

Chi-square tests for two-way tables need two conditions you have seen before:

  • Random data collection (the type depends on the test)
  • Large counts

When sampling without replacement, also check the 10% condition (n ≤ 10%N).

For the large counts condition, verify that all expected counts are at least 5. Use expected counts here, not the observed counts in the table.

Test for Independence

For a test for independence, the data should come from a simple random sample. That means each individual in the population had an equal chance of being selected and the sample was drawn from a single population. A representative random sample lets you generalize your conclusion back to that population.

Test for Homogeneity

For a test for homogeneity, the data should come from a stratified random sample or from a randomized experiment.

A stratified random sample splits the population into non-overlapping groups (strata) based on some characteristic, then draws a simple random sample from each stratum. In an experiment, you instead verify that subjects were randomly assigned to treatment groups, which lets you attribute differences between groups to the treatments rather than to preexisting differences.

How to Use This on the AP Statistics Exam

Free Response

  • State which test you are using and why, based on how the data were collected. One sample with two variables points to independence; separate samples or treatment groups point to homogeneity.
  • Write hypotheses in words with full context. Name the variables or populations from the prompt so a reader knows exactly what you are testing.
  • List both conditions and tie them to the test. For independence, mention the simple random sample; for homogeneity, mention the stratified random sample or random assignment.
  • When you check large counts, say that all expected counts are at least 5. Showing this clearly is important for clean exam work.

Common Trap

  • Do not write hypotheses about a single proportion or mean here. Chi-square hypotheses for two-way tables are about association (independence) or differences in distributions (homogeneity).
  • Do not use observed counts for the large counts check. The condition is about expected counts.

Common Misconceptions

  • Mixing up independence and homogeneity. The test depends on data collection, not on the question wording alone. One sample with two variables means independence; two or more samples or treatment groups means homogeneity.
  • Checking observed counts instead of expected counts. The large counts condition requires all expected counts to be at least 5, not the observed counts in your table.
  • Forgetting context in hypotheses. Generic hypotheses lose meaning. Always name the actual variables or populations from the problem.
  • Saying the variables "are independent" as a conclusion. That comes later when you interpret results. At the setup stage, independence is what the null hypothesis assumes, not something you have shown.
  • Assuming any random sample works for homogeneity. Homogeneity needs separate samples from each population (often stratified) or a randomized experiment, not a single simple random sample.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

alternative hypothesis

The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.

association

The relationship between two variables where knowing the value of one variable provides information about the other variable.

categorical data

Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.

categorical variable

A variable that takes on values that are category names or group labels rather than numerical values.

chi-square test

A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.

chi-square test for homogeneity

A statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments.

chi-square test for independence

A statistical test used to determine whether two categorical variables in a population are associated or independent.

distribution

The pattern of how data values are spread or arranged across a range.

expected count

The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.

homogeneity

In a chi-square test, the condition where the distribution of a categorical variable is the same across different groups or populations.

independence

The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.

null hypothesis

The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.

proportion

A part or share of a whole, expressed as a fraction, decimal, or percentage.

randomized experiment

A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.

row and column variables

The two categorical variables displayed in a two-way table, with one variable defining the rows and the other defining the columns.

sampling without replacement

A sampling method in which an item selected from a population cannot be selected again in subsequent draws.

simple random sample

A sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.

statistical inference

The process of drawing conclusions about a population based on data collected from a sample.

stratified random sample

A sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum.

two-way table

A table that displays the frequency distribution of two categorical variables, organized in rows and columns.

Frequently Asked Questions

What is the difference between a chi-square test for homogeneity and independence?

A chi-square test for homogeneity compares the distribution of one categorical variable across two or more populations or treatments. A chi-square test for independence checks whether two categorical variables are associated within one population.

When do I use a chi-square test for independence?

Use a chi-square test for independence when one random sample is classified by two categorical variables and you want to know whether the variables are associated.

When do I use a chi-square test for homogeneity?

Use a chi-square test for homogeneity when you have separate random samples or treatments and want to compare whether the distribution of a categorical variable is the same across groups.

How do I write hypotheses?

For independence, the null hypothesis says the two categorical variables are independent, and the alternative says they are associated. For homogeneity, the null says the distribution is the same across groups, and the alternative says at least one distribution differs.

What conditions do I check?

Check that the data come from random sampling or random assignment, observations are independent, and expected counts are large enough. AP Statistics commonly uses the condition that all expected counts are at least 5.

Why use expected counts instead of observed counts?

Expected counts show what the table would look like if the null hypothesis were true. The chi-square statistic measures how far the observed counts are from those expected counts.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly→ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot