The chi-square test for independence is an AP Stats inference procedure that uses a two-way (contingency) table from a single sample to test whether two categorical variables are associated, by comparing observed counts to the counts expected if the variables were independent.
The chi-square test for independence answers one question. In a single sample where you recorded two categorical variables for each individual (say, age group and favorite sport), is there a real association between those variables, or could the pattern in your two-way table just be chance?
Here's the logic. The null hypothesis says the two variables are independent, meaning knowing one tells you nothing about the other. From that assumption you calculate expected counts for each cell, using (row total × column total) / grand total. Then the chi-square statistic adds up, across every cell, (observed − expected)² / expected. If observed counts sit far from what independence predicts, the statistic gets big, the p-value gets small, and you reject independence in favor of an association. Degrees of freedom are (rows − 1)(columns − 1), and the usual conditions apply, namely random sampling, independence (the 10% condition if sampling without replacement), and all expected counts at least 5.
This test lives in Unit 8 of AP Statistics, the chi-square unit, alongside the goodness-of-fit test and the test for homogeneity. It's the capstone of your work with categorical data. Way back in Unit 1 you learned to read two-way tables and compare conditional distributions; this test is the formal inference version of that skill. Instead of just eyeballing whether row percentages look different, you now have a p-value to back up your claim. It also reinforces the full hypothesis-testing framework from Units 6 and 7 (hypotheses, conditions, calculations, conclusion in context), just applied to counts instead of proportions or means. Expect to see it in the inference FRQ and in multiple-choice questions about choosing the right test.
Keep studying AP Statistics Unit XXGRYZTH6sTfEcdI
Contingency Table (Units 1 & 8)
The contingency table (two-way table) is the raw material for this test. The chi-square test for independence is basically Unit 1's conditional-distribution comparison upgraded into a formal hypothesis test.
Chi-Square Statistic (Unit 8)
The test statistic is the same Σ(observed − expected)²/expected formula used in all three chi-square tests. What changes between tests is how the data were collected and what the hypotheses claim, not the math.
Degrees of Freedom (Unit 8)
For a test of independence, df = (rows − 1)(columns − 1). Getting df wrong gives you the wrong p-value, so this small calculation is a frequent point on rubrics.
Hypothesis Test & Null Hypothesis (Units 6-8)
This test follows the exact four-part structure you learned for proportions and means. The null hypothesis is that the two variables are independent (no association); the alternative is that an association exists.
Chi-square tests show up reliably on the AP Stats exam, both in multiple choice and in the inference FRQ. The 2026 exam's FRQ 5, for example, asked about an association between age-group and type of sport played among professional athletes, which is textbook test-of-independence territory. To earn full credit you have to do the whole procedure: state hypotheses in terms of association/independence (not in symbols like μ or p), name the test, check conditions (random sample, 10% condition, all expected counts ≥ 5), compute the chi-square statistic and df, find the p-value, and write a conclusion in context that links the p-value to the significance level. Multiple-choice stems often test whether you can pick the right chi-square test, compute an expected count, or interpret what a large chi-square value means. A classic trap is concluding 'the variables are independent' when you fail to reject; the correct phrasing is that you don't have convincing evidence of an association.
These two tests use the identical statistic, df formula, and table, so the difference is all in the study design. Independence means ONE sample, two variables recorded per individual (survey 500 people, record age group and sport preference). Homogeneity means SEPARATE samples from two or more populations, comparing the distribution of one variable across them (sample 200 basketball players and 200 football players separately). On the exam, read how the data were collected before you name the test.
The chi-square test for independence checks whether two categorical variables measured on one sample are associated, using a two-way table of counts.
Expected counts come from assuming independence: (row total × column total) divided by the grand total for each cell.
Degrees of freedom equal (number of rows − 1) times (number of columns − 1).
Conditions to check are a random sample, independence (the 10% condition if sampling without replacement), and all expected counts at least 5.
Failing to reject the null does not prove the variables are independent; it only means you lack convincing evidence of an association.
It uses one sample with two variables, while the test for homogeneity uses separate samples from different populations. The data collection method, not the math, tells them apart.
It's a Unit 8 hypothesis test that uses a two-way table from a single sample to decide whether two categorical variables are associated. It compares observed counts to the counts you'd expect if the variables were independent, using the statistic Σ(observed − expected)²/expected.
Independence uses one sample with two categorical variables recorded per individual; homogeneity uses separate samples from two or more populations and compares the distribution of one variable across them. The calculations are identical, so the design of the study is what determines which test you name.
No. Failing to reject only means you don't have convincing evidence of an association at your significance level. Writing 'the variables are independent' as a conclusion is a common way to lose FRQ points.
Multiply (rows − 1) by (columns − 1) using the categories in your two-way table, not the totals. A 3×4 table, for example, gives df = 2 × 3 = 6.
You need a random sample, independent observations (the 10% condition if sampling without replacement), and a large counts condition requiring every expected count to be at least 5. Note that the large counts condition uses expected counts, not observed counts.