A chi-square goodness of fit test checks whether the observed counts for one categorical variable with two or more categories match a claimed distribution of proportions. To set it up, you write hypotheses about the category proportions, find expected counts using (sample size)(null proportion), and confirm the random, 10%, and large counts conditions.
Why This Matters for the AP Statistics Exam
Goodness of fit is your first chi-square procedure, and it shows up when a question gives you one categorical variable and a claimed set of proportions. On the AP Statistics exam you may need to recognize that goodness of fit (not a one-proportion z-test) is the right tool, state hypotheses in context, calculate expected counts, and verify conditions. Setting up the test correctly is what makes the later steps (test statistic, p-value, conclusion) work, so clean setup is important for clear exam work on both multiple-choice and free-response questions.

Key Takeaways
- Use goodness of fit when you have one categorical variable with two or more categories and a claimed distribution of proportions.
- Expected count for each category = (sample size)(null proportion), so each category gets its own expected count.
- The null hypothesis lists a proportion for every category; the alternative says at least one proportion differs from what was claimed.
- Conditions to check: random sample (or randomized experiment), 10% condition when sampling without replacement, and all expected counts greater than 5.
- The chi-square statistic measures how far observed counts fall from expected counts relative to the expected counts, and its distribution is right-skewed.
- Degrees of freedom for goodness of fit = number of categories minus 1, and the skew gets less pronounced as degrees of freedom increase.
Expected Counts
The expected count is the number of observations you would expect in a category if the null hypothesis were true. In general, an expected count is a sample size times a probability.
For a goodness of fit test, that probability is the null proportion for the category:
So each category has its own expected count. If your sample has 1000 people and the null proportion for a category is 0.20, the expected count for that category is 1000(0.20) = 200.
Expected counts give you a baseline. The whole test compares the counts you actually observed to these expected counts to decide whether any difference is bigger than what random chance would reasonably produce.
The Chi-Square Statistic
The chi-square statistic measures the distance between observed and expected counts relative to the expected counts. You find it by summing the squared differences between observed and expected counts, divided by the expected count, across every category:
A larger chi-square value means the observed counts are farther from the expected counts, which makes it less likely the gap is just random. You will actually compute this value and find a p-value in the next topic; here the goal is the setup.
Chi-Square Distributions
The chi-square distribution is a continuous distribution used to model the chi-square statistic. It only takes positive values and is skewed to the right.
The shape depends on the degrees of freedom, which count the independent pieces of information used in the statistic. For a goodness of fit test:
As the degrees of freedom increase, the right skew becomes less pronounced and the curve looks more symmetric.
When to Use Goodness of Fit
A chi-square goodness of fit test checks whether the observed distribution of one categorical variable matches a claimed set of proportions. Use it when you have one categorical variable with two or more categories.
This is what makes it different from the one-proportion procedures from earlier units, which only handled two outcomes (like yes or no). If people rate happiness on a scale of 1 to 5, that is five categories, so a one-proportion z-test will not work. Goodness of fit handles all five categories at once.
Parameters
You are testing several population proportions at the same time. Each category has a true population proportion you are checking against the claim.
For example, suppose you survey people on a happiness scale from 1 to 5 (5 being happiest), and a claim states:
- 10% rated 1 (unhappy)
- 15% rated 2 (somewhat unhappy)
- 28% rated 3 (sometimes happy, sometimes sad)
- 30% rated 4 (happy)
- 17% rated 5 (always happy)
The parameters are the true proportions of 1s, 2s, 3s, 4s, and 5s in the population.
Hypotheses
Null hypothesis: the null lists a specific proportion for every category. For the happiness example:
Define what each proportion means in context, for example = true proportion of people who rate their happiness a 1, and so on. The subscripts and definitions give the hypotheses context, which matters for clear exam work.
Alternative hypothesis: state that at least one of the claimed proportions is wrong.
You do not list a separate proportion for each category in the alternative. Because the proportions add to 100%, if one is off, at least one other must be off too, so "at least one differs" covers it.
Conditions
Before you can make an inference, check these conditions:
- Random: the data come from a random sample or a randomized experiment.
- 10% condition: when sampling without replacement, the sample is no more than 10% of the population.
- Large counts: all expected counts are greater than 5.
For the happiness example, multiply the sample size by 0.10, 0.15, 0.28, 0.30, and 0.17 and confirm each result is above 5. Note that the large counts condition uses expected counts, not observed counts.
If the data come from an experiment with random assignment of treatments, that random assignment handles the random condition.
Worked Example
A survey claims that when choosing a favorite between Harry Potter, Lord of the Rings, and Star Wars, the three series are equally popular, with 1/3 of people picking each.
To test this claim, a random sample of 2500 US adults is surveyed about their favorite series. Set up the test by writing the hypotheses and checking the conditions.
Hypotheses and Parameters
Where:
- = true proportion who prefer Harry Potter
- = true proportion who prefer Star Wars
- = true proportion who prefer Lord of the Rings
Conditions
- Random: "A random sample of 2500 US adults" (quote the problem).
- 10% condition: it is reasonable that there are more than 25,000 adults in the US, so 2500 is at most 10% of the population.
- Large counts: (same for all three categories).
With the setup complete, the next step is calculating the test statistic and the p-value using the actual observed counts from the sample.
How to Use This on the AP Statistics Exam
Free Response
- Name the test as a chi-square goodness of fit test before doing anything else.
- Write the null hypothesis with a proportion for each category, and define each proportion in context using the wording from the problem.
- Write the alternative as "at least one proportion differs," not a list of separate proportions.
- Check all three conditions, and quote the problem for the random condition.
- For large counts, show the expected count calculations and state that all expected counts are greater than 5.
MCQ
- Recognize goodness of fit when a question has one categorical variable with multiple categories and a claimed distribution.
- Know that expected count = (sample size)(null proportion).
- Remember degrees of freedom = number of categories minus 1.
- Know the chi-square distribution is right-skewed and only takes positive values.
Common Trap
The large counts condition uses expected counts, not observed counts. Plan to compute and check the expected counts even if the observed counts look big enough.
Common Misconceptions
- Checking observed counts instead of expected counts. The large counts condition requires all expected counts to be greater than 5. Observed counts can be small even when expected counts are fine.
- Listing separate proportions in the alternative. The alternative is just "at least one proportion differs." Writing a full list of new proportions is incorrect.
- Forgetting context in the hypotheses. You need to define what each proportion stands for using the problem's wording, not just write symbols.
- Using a one-proportion z-test for many categories. Once a variable has more than two categories, the one-proportion procedures no longer apply; goodness of fit is the correct tool.
- Mixing up degrees of freedom. For goodness of fit, degrees of freedom is number of categories minus 1, not the sample size minus 1.
- Thinking a small chi-square proves the claim is true. A small statistic and large p-value only mean there is not enough evidence against the claimed distribution, not that the claim is proven.
Related AP Statistics Guides
- Unit 8 Overview: Chi Square
- 8.1 Introducing Statistics: Are My Results Unexpected?
- 8.5 Setting Up a Chi-Square Test for Homogeneity or Independence
- 8.3 Carrying Out a Chi Square Goodness of Fit Test
- 8.6 Carrying Out a Chi-Square Test for Homogeneity or Independence
- 8.4 Expected Counts in Two Way Tables
Vocabulary
The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.Term | Definition |
|---|---|
alternative hypothesis | The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for. |
categorical data | Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications. |
chi-square distributions | Probability distributions used to test the goodness of fit between observed and expected categorical data, characterized by positive values and right skewness. |
chi-square statistic | A test statistic that measures the distance between observed and expected counts relative to the expected counts. |
chi-square test | A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution. |
degrees of freedom | A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution. |
distribution of proportions | The way in which proportions are spread across the categories of a categorical variable. |
expected count | The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true. |
goodness of fit | A statistical test that determines how well observed data match the expected distribution specified by a hypothesis. |
independence | The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments. |
null hypothesis | The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference. |
null proportion | The hypothesized proportion for each category under the null hypothesis in a chi-square goodness of fit test. |
observed count | The actual frequency or number of observations in each cell of a contingency table from the collected data. |
proportion | A part or share of a whole, expressed as a fraction, decimal, or percentage. |
random sample | A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference. |
randomized experiment | A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships. |
sample size | The number of observations or data points collected in a sample, denoted as n. |
sampling without replacement | A sampling method in which an item selected from a population cannot be selected again in subsequent draws. |
statistical inference | The process of drawing conclusions about a population based on data collected from a sample. |
Frequently Asked Questions
What are the chi-square goodness of fit conditions?
For a chi-square goodness of fit test, check that the data come from a random sample or randomized experiment, the 10% condition is met when sampling without replacement, and all expected counts are greater than 5.
How do you calculate expected counts for a goodness of fit test?
Multiply the sample size by the null proportion for each category. The expected count for a category is n times that category's claimed proportion.
What is the null hypothesis for a chi-square goodness of fit test?
The null hypothesis states the claimed population proportion for every category. Define each proportion in context so the hypothesis matches the problem.
What is the alternative hypothesis for a goodness of fit test?
The alternative hypothesis says that at least one of the population proportions differs from the claimed value. You do not list new values for every category.
What are the degrees of freedom for a chi-square goodness of fit test?
Degrees of freedom equal the number of categories minus 1. For example, a goodness of fit test with five categories has 4 degrees of freedom.
How is goodness of fit different from independence or homogeneity?
Goodness of fit uses one categorical variable and compares observed counts to a claimed distribution. Independence and homogeneity use two categorical variables and are usually organized in a two-way table.