Fiveable

📊AP Statistics Unit 8 Review

QR code for AP Statistics practice questions

8.3 Carrying Out a Chi Square Goodness of Fit Test

📊AP Statistics
Unit 8 Review

8.3 Carrying Out a Chi Square Goodness of Fit Test

Written by the Fiveable Content Team • Last updated September 2025
Verified for the 2026 exam
Verified for the 2026 examWritten by the Fiveable Content Team • Last updated September 2025
📊AP Statistics
Unit & Topic Study Guides
Pep mascot

Recall from the previous section that a chi-square goodness of fit test determines if an observed frequency distribution differs significantly from a theoretical expected distribution. It is used to test whether the observed frequencies in one or more categories differ significantly from the expected frequencies in those categories.

The big picture procedure for carrying out a chi-square goodness of fit test goes:

(1) Hypotheses: State the null and alternative hypotheses: The null hypothesis is that the observed frequency distribution is the same as the expected frequency distribution, while the alternative hypothesis is that the observed and expected frequency distributions are significantly different.

(2) Significance Level: Choose a significance level: This is the probability of rejecting the null hypothesis when it is true. Commonly used values are 0.1, 0.05, and 0.01.

(3) Chi-Square Statistic: Calculate the chi-square statistic: The chi-square statistic is calculated using the formula:

Source: Cochrane

where "observed" is the observed frequency for each category, and "expected" is the expected frequency for each category.

(4) DF Analysis: Determine the degrees of freedom: The degrees of freedom is equal to the number of categories minus 1.

(5) Critical Value & Tables: Look up the critical value of chi-square in a chi-square table: The critical value is the value that corresponds to the chosen significance level and degrees of freedom.

(6) Comparisons! Compare the chi-square statistic to the critical value: If the chi-square statistic is greater than the critical value, then the null hypothesis is rejected and the alternative hypothesis is accepted. If the chi-square statistic is less than or equal to the critical value, then the null hypothesis cannot be rejected.

(7) Conclusion: If the null hypothesis is rejected, then the observed frequency distribution is significantly different from the expected frequency distribution. If the null hypothesis is not rejected, then the observed frequency distribution is not significantly different from the expected frequency distribution.

Pep mascot
more resources to help you study

Doing The Test!

Now that we have checked our necessary conditions and written our hypotheses for our test, it is now time to actually carry out the test! Our test will consist of two mathematical elements: the test statistic (χ2 statistic) and our p-value.

Test Statistic

The first thing we need to calculate in order to finish our test is our χ2 value which is found using the formula found in the image above. We are going to take each of our observed counts, subtract the expected counts, square that difference and then divide by the expected count. After we have done that for all of our counts, we will sum up the total of these and get our χ2 value for that test.

As with our other test statistics when we used z-scores and t-scores, a χ2 value close to 0 will support the null hypothesis, because it shows that there is not much difference between the observed and expected counts. As that difference increases more and more, we get more of an idea that our expected counts are not accurate. Therefore, leading us to reject the null hypothesis in favor of the alternate hypothesis (which states that at least one of the null proportions is incorrect).

Example

For example, let’s return to our happiness survey with this null hypothesis:

  • 10% said they were unhappy (1),
  • 15% said they were somewhat unhappy (2),
  • 28% said they were sometimes happy and sometimes sad (3),
  • 30% said they were happy (4), and
  • 17% said they were always happy (5)

We take a random sample of 1000 people where 120 respond 1, 180 respond 2, 220 respond 3, 480 respond 4 and 0 respond 5.

We would:

  1. Take our observed counts of 120, 180, 220, 480 and 0,
  2. Subtract the expected counts of 100, 150, 280, 300, 170 respectively,
  3. Square our results;
  4. Divide each of the squared results by their respective expected count;
  5. Sum up all five of the outcomes in step 4. Or… use your handy, dandy, TI84 (or similar) graphing calculator to do this for you (highly recommended)! A general example of calculating chi-square values (in the context of political views in a sample of 300 people) is shown below as well.

Degrees of Freedom

As with our t-score tests and intervals, we have to find our degrees of freedom in order to complete our test. To find our degrees of freedom, we simply take the number of categories and subtract 1. So with our happiness scale example, we would have 4 degrees of freedom. ➖

P-Value

Recall that the p-value is the probability of obtaining a chi-square statistic that is at least as extreme as the one observed, given that the null hypothesis is true.

Once you finally get your χ2 value, you calculate your p-value by finding the probability of getting that particular χ2 by random chance. As always, if our p is low, we reject the Ho. 

To determine the p-value, you will need to use a chi-square table or a computer program to look up the critical value of chi-square that corresponds to the chosen significance level and degrees of freedom. The p-value is then calculated based on the observed chi-square statistic and the critical value.

Once you have calculated the chi-square statistic and p-value, you can then compare the chi-square statistic to the critical value to determine whether to reject or fail to reject the null hypothesis. If the chi-square statistic is greater than the critical value, then the null hypothesis is rejected and the alternative hypothesis is accepted. If the chi-square statistic is less than or equal to the critical value, then the null hypothesis cannot be rejected.

Example

After calculating our test for the happiness example, this was the calculator output that we got:

Conclusion

Just as we concluded hypothesis tests in previous units, we must compare our p-value to a given ɑ value. If it is less than our alpha, we conclude that we reject the H0 and have convincing evidence of the Ha. Otherwise, we fail to reject the null and do not have convincing evidence of the Ha. Remember two things:

  1. Never “accept” anything!
  2. Include context! In the example above, we can see that our p-value is essentially 0. Therefore we would say something like this:

Since our p-value (~0) is less than 0.05, we reject the null hypothesis. We have convincing evidence that at least one of the proportions for how people rank on the happiness scale is incorrect.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

TermDefinition
chi-square distributionA probability distribution used in chi-square tests, characterized by degrees of freedom and used to determine p-values for test statistics.
chi-square testA statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.
degrees of freedomA parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.
expected countThe theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.
null distributionThe probability distribution of the test statistic under the assumption that the null hypothesis is true.
null hypothesisThe initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.
observed countThe actual frequency or number of observations in each cell of a contingency table from the collected data.
p-valueThe probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming the null hypothesis is true.
probability modelA mathematical framework that describes the probability distribution of outcomes under specified assumptions.
reject the null hypothesisThe decision made when the p-value is less than or equal to the significance level, indicating sufficient evidence against the null hypothesis.
significance levelThe threshold probability (α) used to determine whether to reject the null hypothesis in a significance test.
significance testA statistical procedure used to determine whether there is sufficient evidence to reject the null hypothesis based on sample data.
test statisticA calculated value used to determine whether to reject the null hypothesis in a hypothesis test, computed from sample data.
theoretical distributionA probability distribution based on a mathematical model, such as the normal distribution, used to approximate the distribution of a test statistic.

Frequently Asked Questions

What's the formula for chi-square goodness of fit test?

The chi-square goodness-of-fit test statistic is χ² = Σ (Observed − Expected)² / Expected, summing over all categories. Degrees of freedom = number of categories − 1. Under H0 (the specified probability model), χ² has a chi-square distribution (or you can use a randomization distribution if you didn’t assume a model). The p-value is the probability, assuming H0 is true, of getting a χ² as large or larger than the observed value—compare that p-value to α to decide whether to reject H0. Check that all expected counts are large enough (the AP expected-count condition) before using the chi-square table or calculator. The χ² formula and table appear on the AP formula sheet; for a focused walk-through and practice problems, see the Fiveable Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8). For extra practice, use Fiveable’s practice problems (https://library.fiveable.me/practice/ap-statistics).

How do I calculate expected counts for a chi-square test?

For a goodness-of-fit test, the expected count for each category is what you’d expect under the null probability model: expected = (total sample size) × (null probability for that category). - If the null says specific proportions p1, p2, …, pk, then Ei = n × pi for each category i. - If the null says “all categories equally likely,” pi = 1/k so Ei = n/k. You use those Ei in the Pearson chi-square: χ² = Σ (Observed − Expected)² / Expected (CED VAR-8.F.1). Check the expected-count condition before using the chi-square table: typically every Ei should be at least 5 (or combine small categories). That affects whether the chi-square approximation is valid (CED: large-sample approximation). For more examples and steps, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and try practice problems at (https://library.fiveable.me/practice/ap-statistics).

When do I use chi-square goodness of fit vs chi-square independence test?

Use chi-square goodness-of-fit when you’re testing whether one categorical variable follows a specific distribution (one variable, k categories). Null: the population proportions equal the specified values. Test statistic: χ² = Σ (obs − exp)²/exp with df = k − 1. Use the chi-square independence test when you have two categorical variables in a two-way table and you want to know whether they’re associated (or independent). There you compute expected counts from the table margins and use χ² with df = (rows − 1)(cols − 1). Always check conditions: counts are counts (not percentages), and the expected count condition for the large-sample chi-square approximation (AP wants you to verify expected counts are large enough). Find the p-value from the chi-square distribution (or randomization if model isn’t assumed) and compare to α to reject/fail to reject. For more AP-aligned practice and worked examples, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8).

I'm confused about degrees of freedom - is it always n-1 for goodness of fit?

Short answer: usually yes—but with an important caveat. CED-level rule (VAR-8.F.1): for a goodness-of-fit chi-square test, the default degrees of freedom = number of categories − 1. That’s what you’ll use when the null specifies the category probabilities completely (e.g., H0: categorical distribution is 20%, 30%, 50%). The caveat: if you estimate parameters of the null distribution from the data (for example, you fit a Poisson or normal and you estimate its mean from the sample), you lose extra degrees of freedom. Then df = (number of categories − 1) − (number of parameters estimated). Also remember AP allows using a randomization distribution instead of the chi-square model when appropriate (VAR-8.F.2). For more examples and practice, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and the practice question bank (https://library.fiveable.me/practice/ap-statistics).

Step by step how do you do a chi-square goodness of fit test?

Step-by-step (short) for a chi-square goodness-of-fit test: 1. State hypotheses. H0: the population follows the specified distribution (give probabilities); Ha: it does not. 2. Check conditions. Data are counts from a random sample (or randomization) and each expected count = n × model probability is at least 5 (large-sample approx). 3. Compute expected counts for each category: E_i = n·p_i. 4. Calculate the test statistic: χ² = Σ (Observed_i − Expected_i)² / Expected_i (CED VAR-8.F.1). Degrees of freedom = k − 1. 5. Find the p-value from the chi-square distribution (table or software) with df = k−1 (CED VAR-8.G.1). If you used a randomization distribution instead, use that null distribution. 6. Conclusion: compare p to α. If p ≤ α, reject H0 (evidence the distribution differs); if p > α, fail to reject H0 (not enough evidence)—interpret in context (CED DAT-3.I, DAT-3.J). 7. (Optional) Check standardized residuals (Obs−Exp)/√Exp to see which categories contribute most. For practice and AP-style guidance, see the Fiveable Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8).

What does it mean when my chi-square statistic is really big?

A really big chi-square statistic means the observed counts deviate a lot from the expected counts under your null model. By the formula χ² = Σ(Observed − Expected)² / Expected, big squared differences add up, so large χ² values indicate larger discrepancies across categories. Under the null (the specified distribution is true) the null distribution of χ² (with df = number of categories − 1) makes large values unlikely, so a large χ² usually gives a small p-value—meaning you’d reject H0 at typical α levels and conclude the data are inconsistent with the claimed distribution. Remember to check conditions first (expected count condition) and report df and how you found the p-value (table or software). For practice and exam-style guidance on carrying out the test and interpreting p-values, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8). For more practice problems, use (https://library.fiveable.me/practice/ap-statistics).

How do I find the p-value using the chi-square table?

First compute χ² = Σ(Observed − Expected)²/Expected and df = number of categories − 1 (CED VAR-8.F). The chi-square table (AP formula sheet) gives χ² critical values for various tail probabilities p with a specified df—remember the test is an upper-tail test, so p = P(Χ² ≥ observed). To get the p-value from the table: find the row for your df, then locate two adjacent critical values that bracket your observed χ². If your χ² lies between the table entries for p = 0.10 and p = 0.05, your p-value is between 0.10 and 0.05. If χ² is larger than the largest table value listed, p is smaller than the smallest tail probability shown (e.g., p < 0.005). Compare that p-value range to α to reject/fail to reject (CED DAT-3.J). For step-by-step examples see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and more practice (https://library.fiveable.me/practice/ap-statistics).

I don't understand what "goodness of fit" actually means - can someone explain?

“Goodness of fit” asks: do the counts you observed in different categories match what you’d expect under a specified model? In an AP chi-square goodness-of-fit test your H0 is “the population follows the given distribution” (a specific set of category probabilities); Ha is “it doesn’t.” You compute χ² = Σ(Observed − Expected)²/Expected (VAR-8.F.1), with df = #categories − 1, and find a p-value from the chi-square distribution (or a randomization distribution) (VAR-8.F.2, VAR-8.G.1). The p-value is the probability, assuming H0 is true, of getting a χ² as large or larger than the observed (DAT-3.I.1). Compare p to α to decide to reject or fail to reject H0 (DAT-3.J). Check expected-count conditions (all expected counts reasonably large) and report context. For a quick review of steps and examples, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG)—and practice more problems on the Unit 8 page (https://library.fiveable.me/ap-statistics/unit-8) or the AP practice bank (https://library.fiveable.me/practice/ap-statistics).

Do I need to check conditions before doing a chi-square test and what are they?

Yes—you should check conditions before doing a chi-square goodness-of-fit test. AP expects you to verify that the test’s assumptions are reasonable so the chi-square distribution (or a randomization distribution) is valid. Key conditions: - Randomness/independence: data come from a random sample or randomized experiment and observations are independent (or sample ≤10% of the population if sampling without replacement). - Expected count condition: each expected count (under H0) should be at least 5 (AP often phrases this as “expected count condition”—if many expected counts are <5, use a simulation/randomization approach instead). - Appropriate model: H0 specifies the population distribution (so you can compute expected counts); degrees of freedom = number of categories − 1 for the test statistic χ² = Σ (observed − expected)²/expected. If conditions fail, use a randomization distribution or regroup categories. For more AP-aligned detail and examples, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8). For extra practice, try problems at (https://library.fiveable.me/practice/ap-statistics).

Why do we square the differences in the chi-square formula?

We square the differences (Observed − Expected) in χ² = Σ(Obs − Exp)²/Exp for two reasons: math and meaning. Math first—squaring makes all differences positive so large deviations don’t cancel out (positive vs. negative). It also emphasizes bigger discrepancies (a difference of 6 counts contributes 36, much larger than two differences of 3). Meaning-wise—dividing by the expected count standardizes each squared difference so categories with bigger expected counts don’t automatically dominate the total. The result is a single measure of how far the observed counts are from the null-model counts across all categories (the Pearson chi-square statistic in the CED VAR-8.F.1). That statistic’s null distribution (the χ² distribution or a randomization distribution) lets you get a p-value (CED VAR-8.G and DAT-3.I) to decide whether the pattern is unlikely under H0. For more worked examples and practice problems on carrying out the goodness-of-fit test, see the Fiveable study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and the Unit 8 overview (https://library.fiveable.me/ap-statistics/unit-8).

How do I interpret the p-value in a chi-square goodness of fit test?

The p-value in a chi-square goodness-of-fit test is the probability, assuming the null hypothesis and the probability model are true, of getting a chi-square statistic as large (or larger) than the one you observed (CED: DAT-3.I.1). Practically: a small p-value (e.g., 0.03) means the observed counts are unlikely under H0, so you reject H0 at α = 0.05 and conclude the distribution probably differs from the specified model. A large p-value (e.g., 0.40) means the data are consistent with H0, so you fail to reject H0—you don’t “accept” H0, you just don’t have convincing evidence against it (CED: DAT-3.J.1). Remember to find p using a chi-square table or software (VAR-8.G.1), check df = categories − 1, and verify expected-count conditions before trusting the p-value. For a quick review of steps and practice, see Fiveable’s Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and lots of practice problems (https://library.fiveable.me/practice/ap-statistics).

When do I reject the null hypothesis in a chi-square test?

You reject the null in a chi-square goodness-of-fit when the data are too unlikely under H0—that is, when the p-value is less than your significance level α (usually 0.05). Calculate χ² = Σ(Observed − Expected)² / Expected with df = (number of categories − 1). Then either (a) find the p-value from the χ² distribution and reject H0 if p < α, or (b) compare the test statistic to the chi-square critical value: reject if χ² > χ²α,df. Also check conditions first: expected counts should be large enough for the chi-square approximation (each expected count typically ≥ 5) or use a randomization distribution otherwise. The AP exam provides the χ² table and calculator use is allowed—be ready to report χ², df, p-value, and a context sentence. For a quick topic review, see the Fiveable study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and more practice at (https://library.fiveable.me/practice/ap-statistics).

What's the difference between observed and expected frequencies?

Observed counts are the actual numbers you get from your sample—what you actually saw in each category. Expected counts are the counts you would expect in each category if the null hypothesis (the specified distribution) were true. For goodness-of-fit, expected = (total sample size) × (proposed category probability). You use both in the chi-square statistic: χ² = Σ (Observed − Expected)² / Expected. Large differences between observed and expected increase χ² and can lead to a small p-value, meaning the data are unlikely under the null. Don’t forget the expected-count condition: each expected count should be at least about 5 for the chi-square approximation to be reliable (or use a randomization approach). Degrees of freedom = number of categories − 1. For a clear walkthrough and examples, check Fiveable’s Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG).

Can I use my calculator to find chi-square p-values or do I have to use the table?

Yes—you can (and should) use your calculator on the AP exam to get chi-square p-values when it has the right functions. The CED allows p-values from a table or computer-generated output (VAR-8.G.1), and AP instructs you to bring a graphing calculator with statistical capabilities. Most calculators (TI-84, TI-89, Casio, etc.) have a chi-square cdf or a built-in chi-square test that returns the p-value. For example, on a TI-84 you can use chi2cdf(χ², 1E99, df) to get the upper-tail p (or 1-chi2cdf(0, χ², df)). If you don’t have that, the provided chi-square table (in the formula/tables) can give a range for the p-value—that’s acceptable but less precise. Practice using your calculator before the exam so you can compute p quickly and correctly. For a refresher on carrying out the test and examples, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and more practice at (https://library.fiveable.me/practice/ap-statistics).

I keep getting negative expected counts - what am I doing wrong?

You can’t get a negative expected count—if you do, you’ve made an arithmetic or setup mistake. Expected counts are counts predicted by the null model and must be ≥ 0. Check these common errors: - Wrong formula: For goodness-of-fit use Expected = n × p_i (or n/k for equal categories). For two-way tables use Expected = (row total × column total) / grand total. - Entering negative probabilities or proportions: p_i must be between 0 and 1 and sum to 1. If one p_i is negative (or you used observed − expected by mistake) you’ll get negative values. - Calculator/data-entry slip: confirm you input totals, probabilities, or margins correctly (no stray minus signs). - Mixing up standardized residuals or (Observed − Expected) with Expected itself—standardized residuals can be negative, expected counts cannot. Also remember AP CED requirements: use χ² = Σ(Observed − Expected)²/Expected and check the expected-count condition for the chi-square approximation. If you want step-by-step examples, see the Topic 8.3 study guide (https://library.fiveable.me/ap-statistics/unit-8/carrying-out-chi-square-goodness-fit-test/study-guide/XmvvVf9spR7e6xT6TPEG) and practice problems (https://library.fiveable.me/practice/ap-statistics).