Goodness-of-Fit Test
The goodness-of-fit test checks whether observed data matches a specific theoretical distribution. You collect frequency data across categories, then use a chi-square statistic to measure how far your observed counts fall from what the theory predicts. If the gap is large enough, you reject the hypothesized distribution.

Goodness-of-Fit Test for Distributions
This test works with any distribution you can specify in advance: uniform, binomial, Poisson, normal, or any other multinomial setup. The key idea is that your hypothesized distribution tells you what the expected frequencies should be, and you compare those to what you actually observed.
- Null hypothesis (): The data follows the specified distribution
- Alternative hypothesis (): The data does not follow the specified distribution
Test procedure:
- State your null and alternative hypotheses, identifying the theoretical distribution.
- Calculate the expected frequency for each category based on that distribution. Each expected frequency equals the total sample size multiplied by the probability the distribution assigns to that category.
- Verify the conditions: every expected frequency should be at least 5. If some cells fall below 5, you may need to combine adjacent categories.
- Compute the chi-square test statistic (formula below).
- Determine the critical value using your significance level () and degrees of freedom, or find the p-value directly.
- Compare and make your decision: reject or fail to reject .

Test Statistic in the Chi-Square Distribution
The goodness-of-fit test statistic follows a chi-square distribution. The formula is:
- : Observed frequency for category
- : Expected frequency for category
- : Number of categories
Each term in the sum measures how much one category deviates from expectation. Squaring the difference means both overcounts and undercounts contribute positively to the statistic. Dividing by scales each term so that a deviation of 5 matters more when you only expected 10 than when you expected 500.
Degrees of freedom:
You lose one degree of freedom because the expected frequencies must sum to the total sample size. If you also estimate parameters from the data (for example, estimating for a binomial or for a Poisson), you lose an additional degree of freedom for each estimated parameter. So if you estimate one parameter, .

Interpretation of Goodness-of-Fit Results
The goodness-of-fit test is always a right-tailed test. A small value means observed and expected frequencies are close, which supports the null. A large value means the data deviates substantially from the hypothesized distribution.
Two equivalent decision methods:
- Critical value approach: Reject if critical value; fail to reject if critical value.
- P-value approach: Reject if p-value ; fail to reject if p-value .
When you reject , you have sufficient evidence that the data does not follow the specified distribution. When you fail to reject , you lack sufficient evidence to conclude the data deviates from the distribution. Failing to reject does not prove the distribution is correct; it only means the data is consistent with it.
Additional Considerations
- Sample size matters. Larger samples give the test more power to detect real departures from the hypothesized distribution. With very small samples, the chi-square approximation becomes unreliable, which is why the expected frequency of at least 5 per cell condition exists.
- Effect size measures the magnitude of the discrepancy between observed and expected frequencies, independent of sample size. A common measure for goodness-of-fit is Cramér's V or simply reporting the chi-square value relative to .
- Frequency tables (not contingency tables, which involve two variables) are used to organize the observed and expected counts for a single categorical variable in a goodness-of-fit test.