Choosing the Appropriate Distribution for Hypothesis Testing

Distribution selection for hypothesis tests
Picking the correct distribution is the first real decision you make in a hypothesis test. The choice depends on three things: what parameter you're testing (mean vs. proportion), whether you know the population standard deviation, and your sample size.
For population means:
- Use the t-distribution when the population standard deviation () is unknown and the sample size is small (). You estimate with the sample standard deviation , and the t-distribution accounts for the extra uncertainty that introduces.
- Use the z-distribution (standard normal) when is known, or when . With large samples, the Central Limit Theorem kicks in and the t-distribution closely approximates the z-distribution anyway.
For population proportions:
- Use the z-distribution, but only when the sample is large enough for the normal approximation to hold. The conditions are:
where is the sample size and is the hypothesized population proportion. If either product is less than 10, the normal approximation isn't reliable.

Assumptions for statistical tests
Every hypothesis test rests on assumptions. If those assumptions are violated, your results may not be valid.
t-test assumptions:
- The sample is randomly selected
- The population is approximately normally distributed, or so the Central Limit Theorem applies
- The data are continuous and measured on an interval or ratio scale (e.g., temperature, weight)
- The population standard deviation is unknown (you use instead of )
z-test assumptions:
- The sample is randomly selected
- The population standard deviation is known
- The data are continuous and measured on an interval or ratio scale (e.g., IQ scores, annual income)
Proportion test assumptions:
- The sample is randomly selected
- Observations are independent, meaning the outcome of one does not influence another (e.g., flipping a coin multiple times, or sampling less than 10% of the population)
- The data are categorical with exactly two outcomes (pass/fail, defective/non-defective)
- The sample size conditions are met: and

Sample size impact on testing
The Central Limit Theorem (CLT) is the reason sample size matters so much. It states that as increases, the sampling distribution of the sample mean approaches a normal distribution regardless of the shape of the population distribution. In practice, is the standard threshold for the CLT to provide a good approximation.
This has a direct consequence: for large samples, you can use the z-distribution even when is unknown, because becomes a reliable estimate of and the t-distribution converges toward the z-distribution.
Beyond distribution choice, sample size affects your results in two other ways:
- Statistical power increases with larger samples. Power is the probability of correctly rejecting a false null hypothesis, so a larger makes it easier to detect a real effect.
- Small samples (like pilot studies or studies of rare events) may not provide enough evidence to draw valid conclusions, and they're more sensitive to violations of normality assumptions.
Components of hypothesis testing
These four terms come up in every hypothesis test, so make sure you can define each one clearly:
- Null hypothesis (): The default assumption about a population parameter. It typically states "no effect" or "no difference." This is what you test against.
- Alternative hypothesis (): The claim you're investigating. It contradicts and can be one-sided ( or ) or two-sided ().
- Test statistic: A value calculated from your sample data that measures how far your sample result is from what predicts. For means, this is a t-score or z-score; for proportions, it's a z-score.
- Sampling distribution: The probability distribution of a statistic (like the sample mean ) across all possible samples of a given size. This is what you compare your test statistic against to find a p-value.