Fiveable

📊AP Statistics Unit 8 Review

QR code for AP Statistics practice questions

8.5 Setting Up a Chi-Square Test for Homogeneity or Independence

📊AP Statistics
Unit 8 Review

8.5 Setting Up a Chi-Square Test for Homogeneity or Independence

Written by the Fiveable Content Team • Last updated September 2025
Verified for the 2026 exam
Verified for the 2026 examWritten by the Fiveable Content Team • Last updated September 2025
📊AP Statistics
Unit & Topic Study Guides
Pep mascot

Which Test to Run?

The first thing to decide when you realize you are looking at categorical data with more than one variable is to determine if you want to perform a test for independence or a test for homogeneity.

  • χ2 test for independence is appropriate when we are looking at one sample or populations with two variables. Both groups will be drawn from the same population!
  • χ2 test for homogeneity is appropriate when we are looking at two separate samples to determine any difference between their respective populations.
Pep mascot
more resources to help you study

Hypotheses

Once you determine which test is appropriate, the next step is to write your hypotheses. Regardless of the test, be sure to include context in your hypotheses, either by using meaningful subscripts or identifying the parameters of interest.

Templates

The appropriate hypotheses for a chi-square test for homogeneity are: 

  • H0: There is no difference in distributions of a categorical variable across populations or treatments.
  • Ha: There is a difference in distributions of a categorical variable across populations or treatments.

The appropriate hypotheses for a chi-square test for independence are: 

  • H0: There is no association between two categorical variables in a given population or the two categorical variables are independent.
  • Ha: Two categorical variables in a population are associated or dependent

Example: Independence

When writing a set of hypotheses for a test for chi-squared test for independence, your null hypothesis is that there is no association between the two categorical variables in your given population. Your alternative hypothesis is that there IS an association between the two categorical variables of interest.

For example, let’s say that we are looking at how our favorite sport affects someone’s grade in an AP Statistics class. We could take a random sample of 100 students from your high school’s AP Statistics class and ask them what is their favorite sport, football, basketball or baseball, along with their letter grade for the class.

Our hypotheses would be as follows:

  • H0: There is no association between sports preference and letter grade in AP Statistics for students at XYZ High School.
  • Ha: There is an association between sports preference and letter grade in AP Statistics for students at XYZ High School.

Since this problem involves one population (AP Statistics students at XYZ High School), this would require a test for independence.

Example: Homogeneity

When writing a set of hypotheses for a test for chi-squared test for homogeneity, your null hypothesis is that there is no difference in the distribution of the categorical variables between population 1 and population 2. The alternate hypothesis would be that there is a difference between the distribution of the categorical variable between the two populations of interest.

For example, if we wanted to observe how the distribution of sports preference differs among AP Statistics students and AP Calculus students, we could take a random sample of 100 Stats students and 100 Calculus students and determine if the distribution of football, baseball, or basketball preference differs between these two groups.

Our hypotheses would be as follows:

  • H0: There is no difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
  • Ha: There is a difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.

Since this problem involves two populations (AP Statistics students at XYZ High School and AP Calculus students at XYZ High School), this would require a test for homogeneity (we are looking to see if two populations are homogeneous in terms of sports preference)..

A test for homogeneity is also used in a randomized experiment since our sample is creating two “populations.” For instance, individuals receiving new drug treatment & individuals receiving placebo.

Conditions

Chi-squared tests require two familiar conditions for inference:

  • Independence
  • Large Counts

When sampling without replacement, we should check the 10% condition for independence (n < 10%N)

For our large counts condition, we need to verify that all of our expected counts are at least 5 (similar to other chi-square test set-ups).

Test for Independence

For our test for independence, we need to verify that our data was collected using a simple random sample.

To verify that your data was collected using a simple random sample, you can check that the following conditions have been met:

  1. Every member of the population has an equal probability of being included in the sample.
  2. The sample is drawn independently of other samples. If these conditions have been met, then your data was likely collected using a simple random sample, which means that it should be representative of the population and can be used to draw conclusions about the population!

Test for Homogeneity

For our test for homogeneity, we need to verify that our data was collected using a stratified random sample or treatments were randomly assigned (experimental design).

To verify that your data was collected using a stratified random sample, you can check that the following conditions have been met:

  1. The population has been divided into non-overlapping groups, or strata, based on some relevant characteristic.
  2. A simple random sample is drawn from each stratum. If these conditions have been met, then your data was likely collected using a stratified random sample, which means that it should be more representative of the population than a simple random sample because it takes into account the inherent structure of the population.

Alternatively, if you are conducting an experimental study, you can verify that treatments were randomly assigned by checking that the following conditions have been met:

  1. The subjects in the study are randomly assigned to treatment groups.
  2. The experimenter is unaware of which subjects are in which treatment group (i.e., the study is double-blind). If these conditions have been met, then it is likely that the treatments were randomly assigned, which means that any differences between the treatment groups can be attributed to the treatments rather than to preexisting differences between the groups.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

TermDefinition
alternative hypothesisThe claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
associationThe relationship between two variables where knowing the value of one variable provides information about the other variable.
categorical dataData that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.
categorical variableA variable that takes on values that are category names or group labels rather than numerical values.
chi-square testA statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.
chi-square test for homogeneityA statistical test used to determine whether the distributions of a categorical variable are the same across different populations or treatments.
chi-square test for independenceA statistical test used to determine whether two categorical variables in a population are associated or independent.
distributionThe pattern of how data values are spread or arranged across a range.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
homogeneityIn a chi-square test, the condition where the distribution of a categorical variable is the same across different groups or populations.
independenceThe condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
proportionA part or share of a whole, expressed as a fraction, decimal, or percentage.
randomized experimentA study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.
row and column variablesThe two categorical variables displayed in a two-way table, with one variable defining the rows and the other defining the columns.
sampling without replacementA sampling method in which an item selected from a population cannot be selected again in subsequent draws.
simple random sampleA sample selected from a population such that every possible sample of the same size has an equal chance of being chosen.
statistical inferenceThe process of drawing conclusions about a population based on data collected from a sample.
stratified random sampleA sampling method in which a population is divided into separate groups called strata based on shared characteristics, and a simple random sample is selected from each stratum.
two-way tableA table that displays the frequency distribution of two categorical variables, organized in rows and columns.

Frequently Asked Questions

What's the difference between chi-square test for homogeneity and independence?

Short answer: they use the same chi-square statistic but ask different questions and come from different study designs. - Test for homogeneity: compares distributions of one categorical variable across different populations or treatments. H₀: the distributions are the same across groups; Hₐ: at least one differs. Data usually come from separate samples (stratified sampling or randomized experiment). Use when you’re comparing proportions across groups (CED VAR-8.I.1, VAR-8.J.1). - Test for independence: asks whether two categorical variables are associated in a single population. H₀: variables are independent (no association); Hₐ: they’re associated. Data come from one simple random sample (CED VAR-8.I.2, VAR-8.J.2). Common requirements: expected counts > 5 (conservative), n ≤ 10% N when sampling without replacement, and large sample sizes for chi-square approximation (CED VAR-8.K). Both use same expected-count formula and χ² = Σ((obs−exp)²/exp). For AP study, see the Topic 8.5 guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh). Practice lots of problems on Fiveable (https://library.fiveable.me/practice/ap-statistics) to get comfortable setting up H₀/Hₐ and checking conditions.

How do I know when to use chi-square homogeneity vs independence test?

Use homogeneity when you’re comparing the same categorical variable’s distribution across different groups/populations or treatments. Use independence when you’re asking whether two categorical variables are associated in a single population (i.e., do row and column variables depend on each other). Quick checklist: - Question type: “Are the distributions the same across groups?” → chi-square test for homogeneity. “Is there an association between two variables?” → chi-square test for independence. - Hypotheses (CED): Homogeneity H0: no difference in distributions across populations; Ha: distributions differ. Independence H0: variables are independent; Ha: variables are associated. - Data/conditions (CED VAR-8.K): homogeneity often from stratified samples or randomized experiments; independence from a simple random sample. Check 10% rule when sampling without replacement and ensure expected counts ≳5. Degrees of freedom = (rows−1)(cols−1). Review the Topic 8.5 study guide on Fiveable for examples (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

What are the null and alternative hypotheses for chi-square independence test?

Null: H0—the two categorical variables are independent (there’s no association between them in the population). Alternative: Ha—the two categorical variables are associated (dependent) in the population. Say the variables in context (e.g., “Major” and “Prefers online labs”): H0: Major and preference are independent; Ha: Major and preference are associated. On the AP exam you should always state hypotheses in context. Before you proceed, check conditions from the CED (VAR-8.K): data from a simple random sample (or appropriate design), n ≤ 10% of the population when sampling without replacement, and all expected counts > 5 (conservative rule) so the chi-square approximation is valid. For a quick refresher on wording and setup, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh). For extra practice, use the AP Stats problem bank (https://library.fiveable.me/practice/ap-statistics).

I'm confused about chi-square tests - when do I use each one?

Use a chi-square test any time you’re working with counts in a contingency (two-way) table, but pick which test based on how the data were collected and the question you’re asking: - Chi-square test for homogeneity: you’re comparing distributions of a categorical variable across different populations or treatments (e.g., three schools’ distributions of favorite lunch). H₀: the distributions are the same across groups; Hₐ: at least one differs. Data usually come from separate samples or a randomized experiment (stratified/randomized). - Chi-square test for independence: you’ve sampled one population and want to know if two categorical variables are associated (e.g., in one survey, are grade level and club membership related?). H₀: variables are independent; Hₐ: they’re associated. Always check conditions: appropriate sampling (simple random for independence; stratified/experiment for homogeneity), n ≤ 10%N if without replacement, and all expected counts ≳ 5 for the chi-square approximation to be accurate. For setup examples and practice, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8). For lots of practice problems, go to (https://library.fiveable.me/practice/ap-statistics).

How do I set up hypotheses for chi-square homogeneity test?

For a chi-square test for homogeneity you state hypotheses about the distributions of a categorical variable across different populations or treatments. - Null (H0): The distributions are the same across the groups. (e.g., H0: The proportion in each pizza-topping category is the same for School A, School B, and School C.) - Alternative (Ha): At least one group's distribution is different. (e.g., Ha: The distribution of pizza-topping preferences differs for at least one school.) Write them in words tied to context—don’t write cell-level probabilities unless asked. Before you test, verify CED conditions: appropriate sampling/design (stratified sample or randomized experiment for homogeneity), 10% rule when sampling without replacement, and large counts (conservative rule: all expected counts > 5). On the AP, you’ll need the correct hypotheses, checked conditions, expected counts (use formula expected = row total × column total / grand total), chi-square stat, df = (rows–1)(cols–1), and p-value. See the Topic 8.5 study guide for examples (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and extra practice (https://library.fiveable.me/practice/ap-statistics).

What's the formula for expected counts in chi-square tests?

For a two-way table (chi-square test for independence or homogeneity) the expected count for the cell in row i and column j is expected = (row i total × column j total) / grand total. So if row i has 40, column j has 30, and the grand total is 200, the expected count = (40×30)/200 = 6. This is the value you plug into χ² = Σ((observed − expected)² / expected). For a goodness-of-fit test the expected count for a category is n × p(theoretical). Remember AP conditions: data collection method (random/stratified/experiment), 10% when sampling without replacement, and the conservative rule that all expected counts should be > 5 for the chi-square approximation to be accurate (CED VAR-8.K). For a quick review see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

Can someone explain the conditions I need to check for chi-square tests step by step?

Step-by-step checks for a chi-square test (homogeneity or independence): 1. Pick the right test and state hypotheses. - Homogeneity: H0 = distributions are the same across populations; Ha = they differ. - Independence: H0 = variables are independent; Ha = they are associated (CED VAR-8.I). 2. Check how data were collected (CED VAR-8.K.1a). - Independence test: data from a simple random sample (SRS). - Homogeneity test: data from separate random samples, a stratified design, or a randomized experiment. 3. If sampling without replacement, verify the 10% condition: n ≤ 10% of population (CED VAR-8.K.1a.iii). 4. Compute expected counts for every cell: expected = (row total × column total) / table total (see Topic 8.4 resources). 5. Large counts condition (shape): conservative rule—all expected counts > 5. If not met, chi-square approximation may be poor (CED VAR-8.K.1b.i). 6. If conditions hold, proceed to compute χ² = Σ (observed − expected)²/expected, find df = (rows−1)(cols−1), and get p-value (AP exam uses chi-square tables/computer). For practice and examples see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh), the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8), and practice problems (https://library.fiveable.me/practice/ap-statistics).

Why do all expected counts need to be greater than 5 for chi-square?

The chi-square test’s math assumes the test statistic follows a chi-square distribution only when cell counts are “large enough.” If expected counts are too small, the (O−E)²/E terms don’t behave like the continuous chi-square distribution—the sampling distribution becomes skewed and p-values (and conclusions) can be way off. That’s why the CED’s “large counts” condition says a conservative check is that all expected counts > 5 (VAR-8.K.1.b.i/ii). It’s a rule of thumb to make the approximation reliable; with bigger samples the approximation improves and the test is valid. If some expected counts ≤ 5, combine categories if that makes sense, increase sample size, or use an exact method (e.g., Fisher’s exact test for 2×2 tables). For a quick review, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and try practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I verify the independence condition for chi-square tests?

Check three things for the independence condition in a chi-square test (CED VAR-8.K): 1. Data collection: for a test of independence your data should come from a simple random sample (or a randomized experiment). For homogeneity you’d use stratified samples or randomized experiments instead—don’t mix those up (CED VAR-8.K.1.a.i–ii). 2. 10% condition: if you sampled without replacement make sure n ≤ 10% of the population size (CED VAR-8.K.1.a.iii). If you violate this, standard errors and expected counts can be off. 3. Large counts (expected counts): compute expected counts for every cell (row total × column total / grand total). A conservative rule is that all expected counts > 5; if many are ≤5 the chi-square approximation may be poor (CED VAR-8.K.1.b.i). If these hold, you can proceed with H0: variables are independent vs Ha: they’re associated (CED VAR-8.I.2). For a quick refresher, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and more unit resources (https://library.fiveable.me/ap-statistics/unit-8). For extra practice try the AP problem bank (https://library.fiveable.me/practice/ap-statistics).

What's the difference between stratified random sample and simple random sample for chi-square?

Short answer: for chi-square tests, a simple random sample (SRS) is what you want when testing independence in one population; a stratified random sample (or a randomized experiment) is how data are usually collected when testing homogeneity across different populations or treatments. Why it matters (CED terms): - Chi-square test for independence: H0—the two categorical variables are independent in one population. CED VAR-8.K.1.a.i says data should come from an SRS from that population. - Chi-square test for homogeneity: H0—distributions are the same across populations/treatments. CED VAR-8.K.1.a.ii says data should come from a stratified random sample (or a randomized experiment) so you actually sample each population/treatment separately. Also verify other AP conditions: n ≤ 10% of N when sampling without replacement and expected counts > 5 (conservative rule) before doing the test. For the topic overview and worked examples, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh). For more review and practice, check Unit 8 (https://library.fiveable.me/ap-statistics/unit-8) and 1000+ practice problems (https://library.fiveable.me/practice/ap-statistics).

I don't understand when to check the 10% condition - can someone help?

Check the 10% condition any time your sample was taken without replacement from a finite population. The CED says: when sampling without replacement, verify n ≤ 10% of N. That ensures observations are (approximately) independent. How to use it in Topic 8.5: - If your data came from a simple random sample (test for independence) or from strata/experiment (test for homogeneity), ask: was each unit removed from the population without replacement? If yes, compute n/N and check n ≤ 0.10N. - If you did a census or used replacement, skip it. - This is a separate check from “large counts” (expected counts > 5)—you must do both. Example: sample n = 200 students from N = 3,000. Since 200 ≤ 0.10·3,000 (300), the 10% condition is satisfied. The AP exam expects you to state these condition checks when justifying a chi-square test (see VAR-8.K in the CED). For a quick refresher, check the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh). For more practice, use the AP problems on Fiveable (https://library.fiveable.me/practice/ap-statistics).

How do I solve a chi-square test problem from start to finish?

Start by picking the right test: homogeneity if you’re comparing distributions across different populations/treatments, independence if you’re checking association between two categorical variables in one population (CED VAR-8.J). State hypotheses in context: - Homogeneity: H0 = distributions are the same across groups; Ha = they differ. - Independence: H0 = variables are independent; Ha = they are associated (CED VAR-8.I). Check conditions (CED VAR-8.K): data collection method (SRS/stratified/randomized), n ≤ 10% N if sampling without replacement, and all expected counts > 5 (conservative). Make the two-way table, compute expected counts: Expected = (row total × column total) / grand total for each cell. Compute χ² = Σ (Observed − Expected)² / Expected. Degrees of freedom = (rows − 1)(columns − 1). Use the χ² distribution to get a p-value (AP provides chi-square table/formulas on the exam). Compare p to α and state a conclusion in context. For step-by-step examples and practice, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh), the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8), and thousands of practice problems (https://library.fiveable.me/practice/ap-statistics).

What does it mean when variables are associated vs independent in chi-square?

If two categorical variables are independent, knowing the category of one gives you no information about the category of the other in the population. In CED language: H₀ (independence) says “there is no association between the two categorical variables”—the row and column variables are independent. If they’re associated (dependent), knowing one variable changes the probabilities for the other—Hₐ: “the two categorical variables are associated.” On the exam you’ll set hypotheses like that, compute expected counts under independence (expected = row total × column total / grand total), then use χ² = Σ (observed − expected)²/expected. Check conditions first: appropriate sampling (simple random for independence; stratified or randomized for homogeneity), n ≤ 10% N when sampling without replacement, and all expected counts > 5 for accuracy (CED VAR-8.K). For more setup help see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and extra practice (https://library.fiveable.me/practice/ap-statistics).

How do I know if my data came from different populations or one population for chi-square?

Ask yourself: did you sample separate populations (compare their category distributions) or one population and two categorical variables (check association)? If you sampled different groups (e.g., students from three schools) use a chi-square test for homogeneity. If you sampled one population and measured two categorical variables on each unit (e.g., gender and favorite sport) use a chi-square test for independence. Set hypotheses using CED language: - Homogeneity H0: distributions are the same across populations; Ha: at least one differs. - Independence H0: variables are independent (no association); Ha: variables are associated. Also verify conditions: data collected appropriately (stratified samples or randomized experiment for homogeneity; simple random sample for independence), n ≤ 10% of population if sampling w/o replacement, and all expected counts > 5 (conservative rule). If conditions hold, compute expected counts, χ² = Σ(ob−exp)²/exp, get p-value, and compare to α. For an AP-aligned walkthrough, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh). For more practice, try the AP problems (https://library.fiveable.me/practice/ap-statistics).

Why does my teacher keep saying "large counts" for chi-square tests?

Your teacher says "large counts" because the chi-square test relies on an approximation: the distribution of the chi-square statistic only matches the chi-square model well when cell counts (really the expected counts) aren’t too small. If expected counts are tiny, the chi-square formula χ² = Σ[(obs − exp)²/exp] gives a skewed, unreliable result. The CED’s rule (VAR-8.K.1.b.i) is a conservative check: all expected counts > 5. With larger expected counts the sampling distribution is closer to the theoretical chi-square, so p-values and conclusions are trustworthy. Also remember the sampling conditions (random/stratified or randomized experiment and the 10% rule when sampling without replacement) from VAR-8.K.1.a. For practice on checking expected counts and setting up tests, see the Topic 8.5 study guide (https://library.fiveable.me/ap-statistics/unit-8/setting-up-chi-square-test-for-homogeneity-or-independence/study-guide/tZimSvE0pNy9ak5fjCmh) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8). You’ll see this expectation-check on AP questions about chi-square.