biostatistics unit 5 study guides

biological hypothesis testing & inference

5.1

Null and alternative hypotheses in biological research

5.2

Type I and Type II errors, and significance levels

5.3

t-tests and their applications in biology

5.4

Confidence intervals and estimation

unit 5 review

Biological hypothesis testing is a crucial tool for scientists to draw meaningful conclusions from data. It involves formulating null and alternative hypotheses, selecting appropriate statistical tests, and interpreting results based on p-values and significance levels. Key concepts include null and alternative hypotheses, p-values, alpha levels, and types of errors. Various statistical tests, such as t-tests, ANOVA, and chi-square, are used depending on the data type and research question. Proper interpretation of results considers both statistical and biological significance.

Key Concepts

Null hypothesis ($H_0$) states there is no significant difference between specified populations, any observed difference is due to sampling or experimental error
Alternative hypothesis ($H_A$) states there is a significant difference between specified populations, directly contradicting the null hypothesis
P-value probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct
Alpha ($\alpha$) level, also known as the significance level, is the probability threshold below which the null hypothesis is rejected (commonly set at 0.05)
Type I error (false positive) occurs when the null hypothesis is rejected when it is actually true
- Denoted by $\alpha$, the significance level
Type II error (false negative) occurs when the null hypothesis is not rejected when it is actually false
- Denoted by $\beta$, related to statistical power
Statistical power probability of correctly rejecting a false null hypothesis, depends on sample size, effect size, and significance level

Types of Hypotheses in Biology

One-tailed (directional) hypothesis specifies the direction of the expected difference between populations (e.g., group A has a higher mean than group B)
Two-tailed (non-directional) hypothesis states that there is a difference between populations, but does not specify the direction of the difference
Simple hypothesis specifies a single value for a population parameter (e.g., the mean weight of a certain species is 50 grams)
Composite hypothesis specifies a range of values for a population parameter (e.g., the mean weight of a certain species is greater than 50 grams)
Null hypothesis of no difference states that there is no significant difference between the populations being compared
- Used as a starting point for statistical tests
Alternative hypothesis of difference states that there is a significant difference between the populations being compared
- Can be one-tailed or two-tailed
Null hypothesis of no association states that there is no significant relationship between two variables (e.g., no correlation between body size and lifespan)

Steps in Biological Hypothesis Testing

State the null and alternative hypotheses based on the research question and available data
Choose an appropriate statistical test based on the type of data, sample size, and assumptions
- Common tests include t-tests, ANOVA, chi-square, and correlation
Set the significance level ($\alpha$) before conducting the test (usually 0.05)
Collect data through experiments or observations, ensuring proper sampling techniques and experimental design
Calculate the test statistic using the chosen statistical test and the collected data
Determine the p-value associated with the test statistic, which represents the probability of obtaining the observed results if the null hypothesis is true
Compare the p-value to the significance level ($\alpha$)
- If p-value < $\alpha$, reject the null hypothesis in favor of the alternative hypothesis
- If p-value ≥ $\alpha$, fail to reject the null hypothesis (insufficient evidence to support the alternative hypothesis)
Interpret the results in the context of the original research question and consider the biological significance of the findings

Statistical Tests for Biological Data

t-tests compare means between two groups (independent samples) or within a single group (paired samples)
- Assumptions: normality, equal variances, and independence
Analysis of Variance (ANOVA) compares means among three or more groups
- One-way ANOVA for one independent variable, two-way ANOVA for two independent variables
- Assumptions: normality, equal variances, and independence
Chi-square test compares observed and expected frequencies of categorical variables
- Goodness-of-fit test for a single variable, test of independence for two variables
- Assumptions: large sample size, independence, and expected frequencies ≥ 5
Correlation tests measure the strength and direction of the linear relationship between two continuous variables
- Pearson correlation for normally distributed data, Spearman rank correlation for non-normal data
- Assumptions: linearity, no outliers, and homoscedasticity
Regression analysis models the relationship between a dependent variable and one or more independent variables
- Linear regression for continuous variables, logistic regression for binary outcomes
- Assumptions: linearity, independence, normality of residuals, and homoscedasticity
Non-parametric tests (e.g., Mann-Whitney U, Kruskal-Wallis, Wilcoxon signed-rank) used when assumptions of parametric tests are violated
- Less powerful than parametric tests but more robust to violations of assumptions

Interpreting Results and P-values

P-value represents the probability of obtaining the observed results (or more extreme) if the null hypothesis is true
A small p-value (typically < 0.05) indicates strong evidence against the null hypothesis, suggesting that the alternative hypothesis may be true
A large p-value (≥ 0.05) indicates weak evidence against the null hypothesis, suggesting that the null hypothesis cannot be rejected based on the available data
Statistical significance does not necessarily imply biological or practical significance
- Consider the effect size and the context of the research question
Confidence intervals provide a range of plausible values for a population parameter based on the sample data
- Narrower intervals indicate more precise estimates
Effect size measures the magnitude of the difference or relationship between variables
- Cohen's d for t-tests, eta-squared for ANOVA, odds ratio for logistic regression
Results should be interpreted cautiously, considering limitations of the study design, sample size, and potential confounding variables

Common Pitfalls and Misconceptions

Multiple testing problem: conducting many statistical tests increases the likelihood of obtaining a significant result by chance (Type I error)
- Use corrections such as Bonferroni or false discovery rate (FDR) to adjust p-values
Confusing statistical significance with practical or biological significance
- A statistically significant result may not be meaningful in the context of the research question
Overinterpreting non-significant results as evidence of no effect (absence of evidence is not evidence of absence)
- Consider the statistical power and the potential for Type II errors
Assuming that a significant correlation implies causation
- Correlation does not prove causation; consider potential confounding variables and the need for experimental manipulation
Failing to check assumptions of statistical tests, leading to invalid or misleading results
- Assess normality, equal variances, independence, and other assumptions before conducting tests
Overfitting models by including too many predictors relative to the sample size
- Use model selection techniques (e.g., AIC, BIC) and cross-validation to avoid overfitting
Relying solely on p-values for decision-making without considering the context and the limitations of the study
- Use a combination of p-values, effect sizes, confidence intervals, and biological knowledge to interpret results

Real-world Applications in Biology

Comparing the effectiveness of different treatments or interventions in clinical trials (e.g., drug efficacy, surgical techniques)
Assessing the impact of environmental factors on species abundance, diversity, or behavior (e.g., climate change, habitat fragmentation)
Identifying genetic variants associated with diseases or traits using genome-wide association studies (GWAS)
Evaluating the performance of diagnostic tests or biomarkers for detecting diseases or conditions (e.g., sensitivity, specificity)
Investigating the relationship between diet, exercise, or other lifestyle factors and health outcomes (e.g., obesity, cardiovascular disease)
Comparing the growth rates, survival, or reproductive success of different populations or species in ecological studies
Assessing the effectiveness of conservation strategies for protecting endangered species or habitats
Analyzing the expression levels of genes in different tissues, developmental stages, or experimental conditions using RNA-seq or microarray data

Advanced Topics and Future Directions

Bayesian hypothesis testing incorporates prior knowledge and updates the probability of hypotheses based on observed data
- Provides a more flexible and intuitive approach compared to frequentist methods
Non-parametric bootstrapping resamples the observed data to estimate the sampling distribution of a statistic and construct confidence intervals
- Useful when the assumptions of parametric tests are violated or the distribution is unknown
Permutation tests generate a null distribution by randomly shuffling the observed data and calculating the test statistic for each permutation
- Provides exact p-values and is useful when the assumptions of parametric tests are violated
Mixed-effects models account for both fixed and random effects in the data, allowing for the analysis of hierarchical or clustered data structures
- Useful for repeated measures, longitudinal studies, or multi-level data
Machine learning techniques (e.g., random forests, support vector machines) can be used for classification, regression, or clustering of biological data
- Provides a data-driven approach for identifying patterns and making predictions
Integrating multiple data types (e.g., genomics, transcriptomics, proteomics) to gain a more comprehensive understanding of biological systems
- Requires advanced statistical methods and bioinformatics tools for data integration and interpretation
Developing new statistical methods and software tools to handle the increasing complexity and volume of biological data
- Addressing challenges such as high-dimensionality, sparsity, and non-normality of data
Promoting reproducibility and transparency in biological research by sharing data, code, and detailed methods
- Using platforms such as GitHub, Jupyter notebooks, and open-access journals to facilitate collaboration and replication of results

biostatistics unit 5 study guides

unit 5 review

Key Concepts

Types of Hypotheses in Biology

Steps in Biological Hypothesis Testing

Statistical Tests for Biological Data

Interpreting Results and P-values

Common Pitfalls and Misconceptions

Real-world Applications in Biology

Advanced Topics and Future Directions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes

Study Content & Tools

Company

Resources