Hypothesis testing gives you a structured way to use sample data to evaluate a claim about a population. Instead of guessing, you set up competing statements, collect data, and let the evidence guide your conclusion.

Components of Statistical Hypotheses

Every hypothesis test starts with two competing statements:

The null hypothesis ( $H_0$ ) is the default position. It assumes nothing new is happening: no difference, no effect, no change. It always contains some form of equality ( $=$ , $\leq$ , or $\geq$ ). For example, if a school tries a new teaching method, the null hypothesis would be that mean test scores stayed the same.

The alternative hypothesis ( $H_a$ or $H_1$ ) is what you're actually trying to find evidence for. It represents the research claim and always contains an inequality ( $\neq$ , $>$ , or $<$ ). In the teaching method example, the alternative might be that mean test scores increased.

These two hypotheses are mutually exclusive and exhaustive. Only one can be true, and together they cover every possible outcome. You never prove the alternative directly; instead, you look for enough evidence to reject the null.

Components of statistical hypotheses, Hypothesis Testing (3 of 5) | Concepts in Statistics

Symbols in Hypothesis Statements

Hypothesis statements use specific symbols to express claims precisely.

Parameters (what you're making claims about):

$\mu$ = population mean (e.g., average height of students at a university)
$p$ = population proportion (e.g., proportion of defective items on a production line)

Equality symbols (used in $H_0$ ):

$=$ means exactly equal to: $H_0: \mu = 100$
$\leq$ means less than or equal to: $H_0: p \leq 0.05$
$\geq$ means greater than or equal to: $H_0: \mu \geq 500$

Inequality symbols (used in $H_a$ ):

$\neq$ means not equal to: $H_a: \mu \neq 50$ (two-tailed test)
$>$ means greater than: $H_a: p > 0.6$ (right-tailed test)
$<$ means less than: $H_a: \mu < 30$ (left-tailed test)

Subscripts identify which hypothesis a statement belongs to. $H_0$ marks the null; $H_a$ (or $H_1$ ) marks the alternative.

Components of statistical hypotheses, Introduction to Hypothesis Testing | Concepts in Statistics

Decision-Making in Hypothesis Testing

Once you've stated your hypotheses, the testing process follows these steps:

Choose the appropriate test statistic and distribution based on your data type and what you know about the population. For example, use a z-test when you have a large sample and know the population standard deviation.
Calculate the test statistic from your sample data (using the sample mean, sample proportion, or another relevant statistic).
Determine the critical value(s) using the significance level ( $\alpha$ ) and the type of test. A one-tailed test at $\alpha = 0.05$ has a critical value of 1.645; a two-tailed test at the same $\alpha$ uses $\pm 1.96$ .
Make a decision by comparing the test statistic to the critical value:
- If the test statistic falls in the rejection region, reject $H_0$ . (e.g., test statistic of 2.10 > 1.645, so you reject $H_0$ )
- If it does not fall in the rejection region, fail to reject $H_0$ .
- You can also compare the p-value to $\alpha$ : if the p-value is less than $\alpha$ , reject $H_0$ .
Interpret the results in context. Rejecting $H_0$ means you found sufficient evidence to support the alternative. Failing to reject $H_0$ means the data didn't provide enough evidence against it. Note the careful language: you never "accept" $H_0$ , you only fail to reject it.

Errors and Power in Hypothesis Testing

No test is perfect. There are two ways your conclusion can be wrong:

Type I error (false positive): You reject $H_0$ when it's actually true. The probability of this equals $\alpha$ , your significance level. If $\alpha = 0.05$ , there's a 5% chance of a Type I error.
Type II error (false negative): You fail to reject $H_0$ when it's actually false. The probability of this is denoted $\beta$ .

Statistical power is the probability of correctly rejecting a false null hypothesis, calculated as $1 - \beta$ . Higher power means you're more likely to detect a real effect. Three main factors increase power:

Larger sample size
Larger effect size (a bigger true difference is easier to detect)
Higher significance level (though this also raises Type I error risk)

Effect size measures the magnitude of the difference or relationship you're testing. While statistical significance tells you whether an effect likely exists, effect size tells you whether it's large enough to matter in practice.