P-values and significance levels are the tools you use to make a final call in hypothesis testing: should you reject the null hypothesis, or not? This section focuses on how to interpret p-values, compare them to your significance level, and write conclusions that actually answer the research question.

Interpretation of P-Values

A p-value is the probability of observing a sample statistic as extreme as (or more extreme than) the one you got, assuming the null hypothesis is true. It answers the question: "If nothing interesting is really going on, how surprising is my data?"

A small p-value (like 0.01) means your observed result would be very unlikely under the null hypothesis. This counts as evidence against the null.
A large p-value (like 0.80) means your observed result could easily happen under the null hypothesis. Your data isn't surprising enough to challenge it.

The p-value is calculated from the sampling distribution of your test statistic (such as a z-score or t-score) under the null hypothesis. That test statistic measures how far your sample result falls from what the null hypothesis predicts, in standardized units.

Several factors affect the size of your p-value:

Sample size: Larger samples (say, $n = 100$ vs. $n = 20$ ) give more precise estimates, which tend to produce smaller p-values when a real difference exists.
Variability: Less spread in your data makes it easier to detect differences.
Effect size: The bigger the gap between your observed statistic and the null hypothesis value, the smaller the p-value tends to be.

Interpretation of p-values, Hypothesis Test for Difference in Two Population Proportions (3 of 6) | Concepts in Statistics

P-Value vs. Significance Level

The significance level ( $\alpha$ ) is a threshold you set before collecting data. It represents the maximum p-value you're willing to accept and still reject the null hypothesis. Common choices are 0.01, 0.05, and 0.10. Your choice depends on how serious it would be to commit a Type I error (rejecting a null hypothesis that's actually true).

The decision rule is straightforward:

If $p \leq \alpha$ , reject the null hypothesis in favor of the alternative. Your result is statistically significant.
If $p > \alpha$ , fail to reject the null hypothesis. You don't have enough evidence to support the alternative.

For example, if your p-value is 0.02 and $\alpha = 0.05$ , you reject the null because 0.02 ≤ 0.05. But if your p-value is 0.15 with the same $\alpha$ , you fail to reject because 0.15 > 0.05.

A critical distinction: "fail to reject" is not the same as "the null hypothesis is true." It just means your sample didn't provide strong enough evidence against it. The critical value is the corresponding boundary on the test statistic's distribution; if your test statistic falls beyond it, that's equivalent to having $p \leq \alpha$ .

Interpretation of p-values, Hypothesis Test for Difference in Two Population Proportions (4 of 6) | Statistics for the ...

Contextual Conclusions for Hypothesis Tests

Your conclusion should answer the original research question in plain language, not just say "reject" or "fail to reject." Always tie it back to the population and variables you're studying.

When you reject the null hypothesis, state that there is sufficient evidence to support the alternative:

"Based on the sample data, there is sufficient evidence to conclude that the average weight loss for the new diet plan is greater than 5 pounds."
"The survey results indicate a significant preference for Brand A over Brand B among consumers aged 18–34."

When you fail to reject the null hypothesis, state that there is not sufficient evidence to support the alternative:

"Based on the sample data, there is insufficient evidence to conclude that the proportion of defective products differs from the claimed value of 0.05."
"The study did not find a significant difference in job satisfaction between remote and in-office employees."

Finally, acknowledge limitations. Consider whether your sample size was large enough, whether the sample was representative, and whether confounding variables could explain the results. For instance: "While the results suggest a significant difference in customer satisfaction between the two stores, a larger and more diverse sample would be needed to generalize to the entire chain." Hypothesis tests show statistical significance, but they don't automatically establish causation or guarantee the result applies beyond your sample.

Statistical Inference and Estimation

These related concepts come up alongside hypothesis testing:

Confidence intervals give a range of plausible values for a population parameter, complementing the yes/no decision of a hypothesis test.
Statistical power is the probability of correctly rejecting a false null hypothesis. Higher power means you're less likely to miss a real effect (a Type II error).
Standard error measures the variability of a sample statistic across repeated samples. It's used in calculating both confidence intervals and test statistics.