Hypothesis Testing
Hypothesis testing gives you a structured way to use sample data to make decisions about a population. Instead of guessing, you set up two competing claims, collect data, and let the evidence point you toward a conclusion. This section focuses on the foundation: how to write null and alternative hypotheses and understand what they mean.

Formulation of Statistical Hypotheses
Every hypothesis test starts with two competing statements about a population parameter.
The null hypothesis () claims that nothing new is happening. It represents the status quo or the assumption of no effect, no difference, or no change. The null hypothesis always contains an equality component (, , or ).
The alternative hypothesis ( or ) is the claim you're actually trying to find evidence for. It represents a change, difference, or effect, and it always contains a strict inequality (, , or ).
A few concrete examples:
- A company claims its light bulbs last an average of 1,000 hours. You suspect they last fewer.
- A researcher wants to know if a new drug changes blood pressure compared to a placebo.
- (no change)
- (some change, in either direction)
- A school board claims that more than 50% of parents support a new policy.
Notice that the alternative hypothesis determines the type of test: gives a two-tailed test, while or gives a one-tailed test. The direction of comes from the research question, and you must set it before looking at the data.

Interpretation of Hypothesis Symbols
- denotes the null hypothesis; (or ) denotes the alternative hypothesis.
- Greek letters represent population parameters (the true values you're making claims about):
- = population mean
- = population proportion
- = population standard deviation
- Subscripts distinguish between groups when comparing two populations ( vs. , for example).
- The equality piece always belongs in . If you see , , or , that's the null. The strict inequality (, , ) always belongs in .
One thing that trips people up: hypotheses are always about population parameters, never about sample statistics. You'd write , not . The sample data is what you use to test the hypothesis, not what the hypothesis is about.

Decision-Making in Hypothesis Testing
Once hypotheses are set, here's how the testing process works:
- Collect sample data relevant to the claim being tested.
- Compute a test statistic that measures how far your sample result is from what predicts. The type of test statistic depends on the situation (z-statistic for known or large samples, t-statistic for unknown with small samples, etc.).
- Compare to a decision rule using one of two equivalent approaches:
Critical value approach: Determine the critical value(s) from the significance level (commonly 0.05). If the test statistic falls in the rejection region (more extreme than the critical value), reject . Otherwise, fail to reject .
P-value approach: Calculate the p-value, which is the probability of getting a result as extreme as (or more extreme than) your sample data, assuming is true. If the p-value , reject . If the p-value , fail to reject .
- State your conclusion in context:
- Rejecting means the data provides sufficient evidence to support .
- Failing to reject means the data does not provide sufficient evidence to support . This is not the same as proving is true.
That last distinction matters. You never "accept" the null hypothesis. You either reject it or fail to reject it. Failing to reject just means your sample didn't give you enough evidence to conclude otherwise.
Statistical Inference and Interpretation
Hypothesis testing is one of the core tools of statistical inference, which is the broader process of using sample data to draw conclusions about populations.
- Statistical significance means the result is unlikely to have occurred by random chance alone (given that is true). It does not automatically mean the result is practically important.
- Confidence intervals complement hypothesis tests by giving a range of plausible values for the parameter. If a 95% confidence interval for doesn't contain the value in , that's consistent with rejecting at .
- Effect size measures the magnitude of a difference or relationship. A result can be statistically significant but have a tiny effect size, which is why both matter when interpreting results.