Fiveable

๐Ÿ“ŠAP Statistics Unit 8 Review

QR code for AP Statistics practice questions

8.2 Setting Up a Chi Square Goodness of Fit Test

8.2 Setting Up a Chi Square Goodness of Fit Test

Written by the Fiveable Content Team โ€ข Last updated June 2026
Verified for the 2027 exam
Verified for the 2027 examโ€ขWritten by the Fiveable Content Team โ€ข Last updated June 2026
๐Ÿ“ŠAP Statistics
Unit & Topic Study Guides

Previous Exam Prep

AP Cram Sessions 2021

Pep mascot

A chi-square goodness of fit test checks whether the observed counts for one categorical variable with two or more categories match a claimed distribution of proportions. To set it up, you write hypotheses about the category proportions, find expected counts using (sample size)(null proportion), and confirm the random, 10%, and large counts conditions.

Why This Matters for the AP Statistics Exam

Goodness of fit is your first chi-square procedure, and it shows up when a question gives you one categorical variable and a claimed set of proportions. On the AP Statistics exam you may need to recognize that goodness of fit (not a one-proportion z-test) is the right tool, state hypotheses in context, calculate expected counts, and verify conditions. Setting up the test correctly is what makes the later steps (test statistic, p-value, conclusion) work, so clean setup is important for clear exam work on both multiple-choice and free-response questions.

Key Takeaways

  • Use goodness of fit when you have one categorical variable with two or more categories and a claimed distribution of proportions.
  • Expected count for each category = (sample size)(null proportion), so each category gets its own expected count.
  • The null hypothesis lists a proportion for every category; the alternative says at least one proportion differs from what was claimed.
  • Conditions to check: random sample (or randomized experiment), 10% condition when sampling without replacement, and all expected counts greater than 5.
  • The chi-square statistic measures how far observed counts fall from expected counts relative to the expected counts, and its distribution is right-skewed.
  • Degrees of freedom for goodness of fit = number of categories minus 1, and the skew gets less pronounced as degrees of freedom increase.

Expected Counts

The expected count is the number of observations you would expect in a category if the null hypothesis were true. In general, an expected count is a sample size times a probability.

For a goodness of fit test, that probability is the null proportion for the category:

Expectedย count=(sampleย size)(nullย proportion)\text{Expected count} = (\text{sample size})(\text{null proportion})

So each category has its own expected count. If your sample has 1000 people and the null proportion for a category is 0.20, the expected count for that category is 1000(0.20) = 200.

Expected counts give you a baseline. The whole test compares the counts you actually observed to these expected counts to decide whether any difference is bigger than what random chance would reasonably produce.

The Chi-Square Statistic

The chi-square statistic measures the distance between observed and expected counts relative to the expected counts. You find it by summing the squared differences between observed and expected counts, divided by the expected count, across every category:

ฯ‡2=โˆ‘(Observedโˆ’Expected)2Expected\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}

A larger chi-square value means the observed counts are farther from the expected counts, which makes it less likely the gap is just random. You will actually compute this value and find a p-value in the next topic; here the goal is the setup.

Chi-Square Distributions

The chi-square distribution is a continuous distribution used to model the chi-square statistic. It only takes positive values and is skewed to the right.

The shape depends on the degrees of freedom, which count the independent pieces of information used in the statistic. For a goodness of fit test:

degreesย ofย freedom=(numberย ofย categories)โˆ’1\text{degrees of freedom} = (\text{number of categories}) - 1

As the degrees of freedom increase, the right skew becomes less pronounced and the curve looks more symmetric.

When to Use Goodness of Fit

A chi-square goodness of fit test checks whether the observed distribution of one categorical variable matches a claimed set of proportions. Use it when you have one categorical variable with two or more categories.

This is what makes it different from the one-proportion procedures from earlier units, which only handled two outcomes (like yes or no). If people rate happiness on a scale of 1 to 5, that is five categories, so a one-proportion z-test will not work. Goodness of fit handles all five categories at once.

Parameters

You are testing several population proportions at the same time. Each category has a true population proportion you are checking against the claim.

For example, suppose you survey people on a happiness scale from 1 to 5 (5 being happiest), and a claim states:

  • 10% rated 1 (unhappy)
  • 15% rated 2 (somewhat unhappy)
  • 28% rated 3 (sometimes happy, sometimes sad)
  • 30% rated 4 (happy)
  • 17% rated 5 (always happy)

The parameters are the true proportions of 1s, 2s, 3s, 4s, and 5s in the population.

Hypotheses

Null hypothesis: the null lists a specific proportion for every category. For the happiness example:

H0:ย p1=0.10,ย p2=0.15,ย p3=0.28,ย p4=0.30,ย p5=0.17H_0:\ p_1 = 0.10,\ p_2 = 0.15,\ p_3 = 0.28,\ p_4 = 0.30,\ p_5 = 0.17

Define what each proportion means in context, for example p1p_1 = true proportion of people who rate their happiness a 1, and so on. The subscripts and definitions give the hypotheses context, which matters for clear exam work.

Alternative hypothesis: state that at least one of the claimed proportions is wrong.

Ha:ย Atย leastย oneย ofย theย happinessย proportionsย differsย fromย theย claimedย value.H_a:\ \text{At least one of the happiness proportions differs from the claimed value.}

You do not list a separate proportion for each category in the alternative. Because the proportions add to 100%, if one is off, at least one other must be off too, so "at least one differs" covers it.

Conditions

Before you can make an inference, check these conditions:

  • Random: the data come from a random sample or a randomized experiment.
  • 10% condition: when sampling without replacement, the sample is no more than 10% of the population.
  • Large counts: all expected counts are greater than 5.

For the happiness example, multiply the sample size by 0.10, 0.15, 0.28, 0.30, and 0.17 and confirm each result is above 5. Note that the large counts condition uses expected counts, not observed counts.

If the data come from an experiment with random assignment of treatments, that random assignment handles the random condition.

Worked Example

A survey claims that when choosing a favorite between Harry Potter, Lord of the Rings, and Star Wars, the three series are equally popular, with 1/3 of people picking each.

To test this claim, a random sample of 2500 US adults is surveyed about their favorite series. Set up the test by writing the hypotheses and checking the conditions.

Hypotheses and Parameters

H0:ย pHP=13,ย pSW=13,ย pLOTR=13H_0:\ p_{HP} = \tfrac{1}{3},\ p_{SW} = \tfrac{1}{3},\ p_{LOTR} = \tfrac{1}{3}

Ha:ย Atย leastย oneย ofย theย proportionsย ofย favoriteย seriesย differsย fromย 13.H_a:\ \text{At least one of the proportions of favorite series differs from } \tfrac{1}{3}.

Where:

  • pHPp_{HP} = true proportion who prefer Harry Potter
  • pSWp_{SW} = true proportion who prefer Star Wars
  • pLOTRp_{LOTR} = true proportion who prefer Lord of the Rings

Conditions

  • Random: "A random sample of 2500 US adults" (quote the problem).
  • 10% condition: it is reasonable that there are more than 25,000 adults in the US, so 2500 is at most 10% of the population.
  • Large counts: 2,500โ‹…13โ‰ˆ833>52,500 \cdot \tfrac{1}{3} \approx 833 > 5 (same for all three categories).

With the setup complete, the next step is calculating the test statistic and the p-value using the actual observed counts from the sample.

How to Use This on the AP Statistics Exam

Free Response

  • Name the test as a chi-square goodness of fit test before doing anything else.
  • Write the null hypothesis with a proportion for each category, and define each proportion in context using the wording from the problem.
  • Write the alternative as "at least one proportion differs," not a list of separate proportions.
  • Check all three conditions, and quote the problem for the random condition.
  • For large counts, show the expected count calculations and state that all expected counts are greater than 5.

MCQ

  • Recognize goodness of fit when a question has one categorical variable with multiple categories and a claimed distribution.
  • Know that expected count = (sample size)(null proportion).
  • Remember degrees of freedom = number of categories minus 1.
  • Know the chi-square distribution is right-skewed and only takes positive values.

Common Trap

The large counts condition uses expected counts, not observed counts. Plan to compute and check the expected counts even if the observed counts look big enough.

Common Misconceptions

  • Checking observed counts instead of expected counts. The large counts condition requires all expected counts to be greater than 5. Observed counts can be small even when expected counts are fine.
  • Listing separate proportions in the alternative. The alternative is just "at least one proportion differs." Writing a full list of new proportions is incorrect.
  • Forgetting context in the hypotheses. You need to define what each proportion stands for using the problem's wording, not just write symbols.
  • Using a one-proportion z-test for many categories. Once a variable has more than two categories, the one-proportion procedures no longer apply; goodness of fit is the correct tool.
  • Mixing up degrees of freedom. For goodness of fit, degrees of freedom is number of categories minus 1, not the sample size minus 1.
  • Thinking a small chi-square proves the claim is true. A small statistic and large p-value only mean there is not enough evidence against the claimed distribution, not that the claim is proven.

Vocabulary

The following words are mentioned explicitly in the College Board Course and Exam Description for this topic.

Term

Definition

alternative hypothesis

The claim that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.

categorical data

Data that represents categories or groups rather than numerical measurements, such as colors, types, or classifications.

chi-square distributions

Probability distributions used to test the goodness of fit between observed and expected categorical data, characterized by positive values and right skewness.

chi-square statistic

A test statistic that measures the distance between observed and expected counts relative to the expected counts.

chi-square test

A statistical test used to determine whether observed frequencies of categorical data match expected frequencies based on a hypothesized distribution.

degrees of freedom

A parameter of the t-distribution that affects its shape; as degrees of freedom increase, the t-distribution approaches the normal distribution.

distribution of proportions

The way in which proportions are spread across the categories of a categorical variable.

expected count

The theoretical frequency in each cell of a contingency table that would be expected if the null hypothesis of independence or homogeneity were true.

goodness of fit

A statistical test that determines how well observed data match the expected distribution specified by a hypothesis.

independence

The condition that observations in a sample are not influenced by each other, typically ensured through random sampling or randomized experiments.

null hypothesis

The initial claim or assumption being tested in a hypothesis test, typically stating that there is no effect or no difference.

null proportion

The hypothesized proportion for each category under the null hypothesis in a chi-square goodness of fit test.

observed count

The actual frequency or number of observations in each cell of a contingency table from the collected data.

proportion

A part or share of a whole, expressed as a fraction, decimal, or percentage.

random sample

A sample selected from a population in such a way that every member has an equal chance of being chosen, reducing bias and allowing for valid statistical inference.

randomized experiment

A study design where subjects are randomly assigned to treatment groups to establish cause-and-effect relationships.

sample size

The number of observations or data points collected in a sample, denoted as n.

sampling without replacement

A sampling method in which an item selected from a population cannot be selected again in subsequent draws.

statistical inference

The process of drawing conclusions about a population based on data collected from a sample.

Frequently Asked Questions

What are the chi-square goodness of fit conditions?

For a chi-square goodness of fit test, check that the data come from a random sample or randomized experiment, the 10% condition is met when sampling without replacement, and all expected counts are greater than 5.

How do you calculate expected counts for a goodness of fit test?

Multiply the sample size by the null proportion for each category. The expected count for a category is n times that category's claimed proportion.

What is the null hypothesis for a chi-square goodness of fit test?

The null hypothesis states the claimed population proportion for every category. Define each proportion in context so the hypothesis matches the problem.

What is the alternative hypothesis for a goodness of fit test?

The alternative hypothesis says that at least one of the population proportions differs from the claimed value. You do not list new values for every category.

What are the degrees of freedom for a chi-square goodness of fit test?

Degrees of freedom equal the number of categories minus 1. For example, a goodness of fit test with five categories has 4 degrees of freedom.

How is goodness of fit different from independence or homogeneity?

Goodness of fit uses one categorical variable and compares observed counts to a claimed distribution. Independence and homogeneity use two categorical variables and are usually organized in a two-way table.

Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal โ†’ update your plan โ†’ choose Yearlyโ†’ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs โ†’ See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot