Fiveable

📊Honors Statistics Unit 11 Review

QR code for Honors Statistics practice questions

11.7 Lab 1: Chi-Square Goodness-of-Fit

11.7 Lab 1: Chi-Square Goodness-of-Fit

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Chi-Square Goodness-of-Fit Test

The chi-square goodness-of-fit test checks whether observed categorical data matches a specific expected distribution. You'll use it whenever you have counts across categories and want to know: does what we observed differ significantly from what we predicted?

Pep mascot
more resources to help you study

Calculation of Chi-Square Test Statistics

The goodness-of-fit test compares observed frequencies to expected ones across categories, then combines those differences into a single test statistic.

  • Null hypothesis (H0H_0): The data follows the hypothesized distribution
  • Alternative hypothesis (HaH_a): The data does not follow the hypothesized distribution

To calculate the test statistic:

  1. Record the observed frequency (OiO_i) for each category from your data.

  2. Calculate the expected frequency (EiE_i) for each category using the hypothesized distribution:

    • Ei=n×piE_i = n \times p_i, where nn is the total sample size and pip_i is the hypothesized proportion for category ii
  3. Compute the chi-square test statistic:

    • χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}, where kk is the number of categories

Each term in that sum measures how far one category's observed count is from its expected count, scaled by the expected count. Squaring the difference means both overestimates and underestimates contribute positively to the statistic. A larger χ2\chi^2 value means a worse fit between observed and expected.

Degrees of freedom: df=k1df = k - 1, where kk is the number of categories. You lose one degree of freedom because the category counts must sum to nn.

Condition check: Every expected frequency EiE_i should be at least 5 for the chi-square approximation to be reliable. If any expected count falls below 5, consider combining categories.

Calculation of chi-square test statistics, PSPP for Beginners

Interpretation of P-Values for Distributions

The p-value is the probability of getting a χ2\chi^2 statistic as large as (or larger than) the one you calculated, assuming H0H_0 is true. Because the goodness-of-fit test is always right-tailed, you're only looking at the upper end of the chi-square distribution.

  • If p-value < significance level (typically 0.05): Reject H0H_0. There is sufficient evidence that the data does not follow the hypothesized distribution.
    • Risk: You might be committing a Type I error (rejecting a true null hypothesis).
  • If p-value ≥ significance level: Fail to reject H0H_0. There is not enough evidence to conclude the data differs from the hypothesized distribution.
    • Risk: You might be committing a Type II error (failing to reject a false null hypothesis).

Always state your conclusion in context. Don't just say "reject H0H_0." Say something like: "At the 0.05 significance level, there is sufficient evidence that the distribution of candy colors differs from the company's claimed proportions."

Calculation of chi-square test statistics, Chi-square Goodness of Fit test

Application of Chi-Square Tests

Here's the full process for conducting a goodness-of-fit test:

  1. Identify the categorical variable and state the hypothesized distribution (the proportions you expect).

  2. Collect a random sample and record observed frequencies for each category.

  3. Calculate expected frequencies using Ei=n×piE_i = n \times p_i.

  4. Verify that all expected frequencies are at least 5.

  5. Compute the χ2\chi^2 test statistic and determine df=k1df = k - 1.

  6. Find the p-value from the chi-square distribution table or calculator.

  7. Compare the p-value to your significance level and state your conclusion in context.

Example: M&M Colors. Suppose Mars claims 20% of M&Ms are blue. You buy a bag of 200 and count 52 blue ones. The expected count is 200×0.20=40200 \times 0.20 = 40. That single category contributes (5240)240=3.6\frac{(52 - 40)^2}{40} = 3.6 to the χ2\chi^2 statistic. You'd repeat this for every color, sum the contributions, and then find the p-value.

Example: Fair Die. Roll a die 120 times. Under a fair die, you expect 120/6=20120 / 6 = 20 outcomes per face. You'd compare your observed counts for each face against 20 using the same formula, with df=5df = 5.

Note: The goodness-of-fit test applies to one categorical variable. Analyzing the relationship between two categorical variables uses a chi-square test of independence, which is a different procedure.

Statistical Power and Effect Size

  • Statistical power is the probability of correctly rejecting H0H_0 when it's actually false. Higher power means you're less likely to miss a real difference.
  • Power increases with larger sample sizes, larger significance levels, and larger effect sizes.
  • Effect size quantifies how much the observed distribution deviates from the expected one. A statistically significant result with a tiny effect size may not be practically meaningful.

In lab settings, you'll often work with fixed sample sizes, so the main takeaway is this: with a small sample, you might fail to detect a real difference simply because you lack the power to do so.