Fiveable

🎲Intro to Statistics Unit 11 Review

QR code for Intro to Statistics practice questions

11.2 Goodness-of-Fit Test

11.2 Goodness-of-Fit Test

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

The goodness-of-fit test checks whether sample data matches a specific probability distribution. You collect observed frequencies from your data, calculate what the frequencies should be under the hypothesized distribution, and then use a chi-square statistic to measure the gap between the two. A large gap means the data probably doesn't follow the distribution you proposed.

Goodness-of-Fit Test

Goodness-of-fit test for distributions

This test answers a straightforward question: does your sample data come from a population with a particular probability distribution (normal, binomial, Poisson, uniform, etc.)?

  • Null hypothesis (H0H_0): The data follows the specified distribution.
  • Alternative hypothesis (HaH_a): The data does not follow the specified distribution.

Steps to perform the test:

  1. State your null and alternative hypotheses.
  2. Calculate the expected frequency for each category based on the hypothesized distribution.
  3. Calculate the chi-square test statistic using observed and expected frequencies.
  4. Determine degrees of freedom and find the critical value from the chi-square distribution table.
  5. Compare the test statistic to the critical value and decide whether to reject or fail to reject H0H_0.
  6. Optionally, calculate the p-value to assess how strong the evidence is against H0H_0.
Goodness-of-fit test for distributions, PSPP for Beginners

Test statistic calculation

The chi-square test statistic measures how far your observed data deviates from what you'd expect under H0H_0:

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

  • OiO_i = observed frequency for category ii
  • EiE_i = expected frequency for category ii
  • kk = number of categories

How to find expected frequencies: Multiply the total sample size nn by the probability of each category under the hypothesized distribution. For example, if you have 200 observations and the hypothesized distribution says category A should contain 30% of values, then EA=200×0.30=60E_A = 200 \times 0.30 = 60.

Each expected frequency should be at least 5 for the chi-square approximation to be reliable. If any expected count falls below 5, you may need to combine adjacent categories.

Degrees of freedom:

df=k1mdf = k - 1 - m

  • kk = number of categories
  • mm = number of parameters you estimated from the sample data

If the hypothesized distribution is fully specified in advance (e.g., "each category has equal probability"), then m=0m = 0 and df=k1df = k - 1. But if you estimated a parameter from the data first (like using the sample mean as the Poisson rate), you lose an additional degree of freedom for each estimated parameter.

Goodness-of-fit test for distributions, Goodness-of-Fit Test | Introduction to Statistics

Interpretation of chi-square results

The goodness-of-fit test is always a right-tailed test. That's because the chi-square statistic can only be zero or positive: small values mean the data fits well, and large values mean it doesn't.

  • Find the critical value from the chi-square distribution table using your degrees of freedom and significance level (typically α=0.05\alpha = 0.05).
  • If χ2>critical value\chi^2 > \text{critical value}: Reject H0H_0. There is sufficient evidence that the data does not follow the specified distribution.
  • If χ2critical value\chi^2 \leq \text{critical value}: Fail to reject H0H_0. There is not enough evidence to conclude the data differs from the specified distribution.

You can also use the p-value approach: if the p-value is less than α\alpha, reject H0H_0. The p-value tells you the probability of getting a test statistic at least as extreme as yours, assuming H0H_0 is true.

Additional Considerations

  • The goodness-of-fit test is a nonparametric test, meaning it doesn't require assumptions about the shape of the population distribution. It works directly with frequency counts.
  • It's designed for categorical data. If you have continuous data, you'll need to group it into categories (bins) first.
  • The same chi-square framework extends to other tests you'll encounter, like the test of independence and the test of homogeneity, which use contingency tables instead of a single row of categories.
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly → and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc
report an error
description

screenshots help us find and fix the issue faster (optional)

add screenshot

2,589 studying →