Every hypothesis test ends with a decision: reject the null hypothesis or fail to reject it. Since you're working with sample data, not the entire population, there is always a chance the decision is wrong. These mistakes fall into two categories, and knowing the difference helps you interpret any hypothesis test result.

Type I vs Type II Errors

There are four possible outcomes when you run a hypothesis test. Two are correct decisions, and two are errors:

	$H_0$ is actually true	$H_0$ is actually false
Reject $H_0$	Type I Error (false positive)	Correct decision
Fail to reject $H_0$	Correct decision	Type II Error (false negative)

Type I Error (False Positive) occurs when you reject the null hypothesis even though it's true.

Denoted by $\alpha$ (alpha)
You conclude an effect exists when it actually doesn't
Example: A clinical trial concludes a new drug is effective, but in reality it's no better than a placebo. Patients end up receiving an ineffective treatment.

Type II Error (False Negative) occurs when you fail to reject the null hypothesis even though it's false.

Denoted by $\beta$ (beta)
You miss a real effect that's actually there
Example: A screening test fails to detect a disease in a patient who actually has it. That patient doesn't receive the treatment they need.

A helpful way to keep them straight: Type I is a false alarm, and Type II is a missed detection.

Type I vs Type II errors, hypothesis testing - Type I error and type II error trade off - Cross Validated

Probabilities of Hypothesis Testing Errors

Alpha ( $\alpha$ ) is the probability of making a Type I error. You choose this value before running your test, and it's called the significance level.

Common choices are $\alpha = 0.05$ (5% risk) or $\alpha = 0.01$ (1% risk)
If $\alpha = 0.05$ , you're accepting a 5% chance of rejecting a true null hypothesis
This value determines where you set your critical value in the decision rule. A smaller $\alpha$ means you need stronger evidence to reject $H_0$

Beta ( $\beta$ ) is the probability of making a Type II error. Unlike $\alpha$ , you don't directly choose $\beta$ . It depends on several factors:

Sample size: Larger samples reduce $\beta$
Effect size: Bigger real differences are easier to detect, so $\beta$ is smaller
Significance level: A stricter (smaller) $\alpha$ makes $\beta$ larger, because you're demanding stronger evidence

If $\beta = 0.20$ , there's a 20% chance you'll fail to detect a real effect.

Notice the tradeoff: lowering $\alpha$ (to protect against false positives) increases $\beta$ (making false negatives more likely), assuming everything else stays the same. You can't minimize both errors simultaneously without increasing your sample size.

Type I vs Type II errors, Hypothesis Testing and Types of Errors

Power of the Test

Power is the probability of correctly rejecting a false null hypothesis. It equals $1 - \beta$ .

If $\beta = 0.20$ , then power = $1 - 0.20 = 0.80$ , or 80%
Higher power means you're more likely to detect a real effect when one exists
Researchers generally aim for power of at least 0.80 (80%)

Three main factors affect power:

Sample size: Larger samples give you more data, which produces smaller standard errors and makes real effects easier to detect. This is the factor researchers have the most control over.
Effect size: A large real difference between the true value and the null hypothesis value is easier to spot than a tiny one. Think of it this way: a coin that lands heads 90% of the time is much easier to identify as biased than one that lands heads 52% of the time.
Significance level ( $\alpha$ ): Raising $\alpha$ (say, from 0.01 to 0.05) increases power because you're using a less strict threshold for rejection. But this also increases your Type I error risk.

Researchers often conduct a power analysis before collecting data. This calculation tells you how large your sample needs to be to achieve a desired level of power for a specific effect size and significance level.

Statistical Inference and Decision-Making

Hypothesis testing is one part of the broader process of statistical inference, where you draw conclusions about a population using sample data.

The p-value measures the strength of evidence against the null hypothesis. If the p-value is less than or equal to $\alpha$ , you reject $H_0$ . If it's greater, you fail to reject.
Confidence intervals complement hypothesis tests by giving you a range of plausible values for the population parameter, not just a reject/fail-to-reject decision.

Together, p-values, confidence intervals, and an awareness of Type I and Type II errors give you a more complete picture than any single measure alone.