๐Ÿ›Biostatistics

Types of Bias in Research

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Bias introduces systematic error into research, and biostatistics exams will test whether you can identify when, why, and how different biases distort study findings. You're not just being tested on definitions. You need to understand the mechanisms that cause the error and recognize whether a bias affects who gets into a study, how data is collected, or how results are interpreted and shared.

The biases you'll encounter fall into categories based on where in the research process they occur: participant selection, data collection, analysis, and dissemination. Knowing these categories helps you quickly diagnose problems in study design and propose fixes. Don't just memorize names. Know what stage of research each bias threatens and what strategies prevent it.


Biases in Participant Selection

These biases occur before data collection even begins. When the sample doesn't accurately represent the target population, external validity collapses no matter how rigorous the rest of the study.

Selection Bias

Selection bias happens when the people included in a study systematically differ from the population you're trying to learn about. This threatens generalizability because your results may only apply to the specific (non-representative) group you studied.

  • Case-control studies are particularly vulnerable when cases and controls differ in characteristics beyond the exposure of interest
  • Volunteer bias is a common subtype: people who volunteer for studies tend to be healthier, more educated, or more motivated than non-volunteers, which skews results

Sampling Bias

Sampling bias arises from how you recruit participants. If you use non-random selection or convenience sampling (say, recruiting only from one hospital), the people who enter your study won't reflect the broader population.

  • Self-selection is a related problem: when participants choose whether to join, you often end up with healthier or more motivated samples
  • External validity suffers most, meaning your findings may not apply beyond your specific study population

Attrition Bias

Attrition bias occurs when participants drop out of a study for reasons related to the exposure or outcome being studied. The key word is "non-random." If sicker patients quit a drug trial because of side effects, the remaining participants look healthier than they should.

  • Differential loss to follow-up between groups creates systematic differences that bias your effect estimates
  • Intention-to-treat (ITT) analysis helps mitigate this by analyzing participants in their originally assigned groups regardless of whether they completed the study

Compare: Selection bias vs. Sampling bias. Both affect who's in your study, but selection bias refers to systematic differences in how participants are chosen or enrolled, while sampling bias specifically involves flawed sampling techniques. If an exam question describes a convenience sample, think sampling bias first.


Biases in Data Collection

These biases corrupt the accuracy of measurements after participants are enrolled. Systematic errors in how exposure or outcome data are gathered lead to misclassification, either differential (varying by group) or non-differential (equal across groups).

Information Bias

Information bias is the broad category for systematic errors in measuring exposures or outcomes. The core mechanism is misclassification: putting someone in the wrong exposure or outcome category because of faulty instruments, inconsistent protocols, or inaccurate records.

  • Non-differential misclassification (errors that occur equally in all groups) typically biases results toward the null, making real effects harder to detect
  • Differential misclassification (errors that differ between groups) can bias results in either direction, which makes it more dangerous and harder to predict

Recall Bias

Recall bias occurs when participants inaccurately remember past exposures. It's especially problematic in retrospective studies where you're asking people to think back months or years.

  • Differential recall is the real concern: cases tend to search their memories more thoroughly than controls. For example, mothers of children with birth defects are more likely to recall every medication they took during pregnancy than mothers of healthy children.
  • Case-control studies relying on self-reported historical data are most vulnerable. Using objective medical records instead of self-report is the best prevention strategy.

Observer Bias

Observer bias occurs when a researcher's expectations influence how they measure or interpret outcomes. If an investigator knows which patients received the treatment, they may unconsciously rate those patients as improved.

  • Subjective endpoints like pain scales, disease severity ratings, or behavioral assessments are particularly susceptible
  • Blinding observers to group assignments is the primary prevention strategy

Compare: Recall bias vs. Observer bias. Both involve subjective distortion, but recall bias originates with participants misremembering, while observer bias originates with researchers misinterpreting. Blinding fixes observer bias; using objective records (rather than self-report) fixes recall bias.


Biases in Analysis and Interpretation

These biases affect how relationships between variables are understood. Even with perfect selection and measurement, failing to account for extraneous variables or time-related artifacts can produce misleading conclusions.

Confounding Bias

Confounding occurs when a third variable is associated with both the exposure and the outcome, creating a spurious (fake) association between them. The confounding variable provides an alternative explanation for your results.

Classic example: Early studies found an association between coffee drinking and lung cancer. But coffee drinkers were more likely to smoke, and smoking causes lung cancer. Smoking was the confounder, and once you accounted for it, the coffee-cancer link disappeared.

Four key strategies to address confounding:

  1. Randomization (at the design stage) distributes all confounders, even unknown ones, equally across groups
  2. Restriction limits enrollment to one level of the confounder (e.g., only studying non-smokers)
  3. Matching pairs participants on the confounding variable
  4. Statistical adjustment uses techniques like multivariable regression to control for confounders during analysis

Lead-Time Bias

Lead-time bias is a screening artifact. It occurs when earlier detection through screening appears to extend survival without actually changing the course of the disease.

Here's why it happens: survival time is measured from the point of diagnosis. If screening detects a cancer at age 55 instead of symptoms appearing at age 60, and the patient dies at age 68 either way, the screened patient looks like they survived 13 years while the unscreened patient survived only 8. The screening didn't add life; it just added time spent knowing about the disease.

Mortality rates (deaths per population over a time period) provide a more accurate measure of whether a screening program actually saves lives, because they don't depend on when diagnosis occurs.

Compare: Confounding bias vs. Information bias. Confounding involves a real third variable distorting the exposure-outcome relationship, while information bias involves measurement error in the exposure or outcome itself. Confounding can often be addressed during analysis with statistical adjustment; information bias generally cannot be fixed after data collection.


Biases in Research Dissemination

These biases occur after studies are completed, affecting what evidence reaches the scientific community. When published literature doesn't reflect all conducted research, systematic reviews and meta-analyses inherit distorted effect estimates.

Reporting Bias

Reporting bias occurs within a single study when researchers emphasize certain results while downplaying or omitting others based on statistical significance or the direction of findings.

  • Outcome switching is a specific form: researchers change their primary endpoint after seeing results, then present the "new" primary outcome as if it were planned all along. This inflates apparent treatment effects.
  • Trial registration requirements (e.g., ClinicalTrials.gov) help prevent this by documenting planned outcomes before the study begins, so reviewers can check whether the published results match the original plan.

Publication Bias

Publication bias operates across studies. Journals are more likely to publish studies with significant or favorable findings, while null or negative results often go unpublished.

  • The file-drawer problem describes this: negative studies sit in researchers' file drawers and never enter the evidence base. This means the published literature overestimates true effect sizes.
  • Funnel plots in meta-analyses can detect publication bias. In an unbiased literature, a funnel plot of effect sizes vs. study precision should be roughly symmetrical. Asymmetry suggests that small studies with null results are missing.

Compare: Reporting bias vs. Publication bias. Reporting bias occurs within a study (selective presentation of outcomes) and is author-driven. Publication bias occurs across studies (selective publication of entire studies) and involves editorial and systemic factors. Both distort the evidence base.


Quick Reference Table

ConceptBest Examples
Participant selection problemsSelection bias, Sampling bias, Attrition bias
Measurement/data collection errorsInformation bias, Recall bias, Observer bias
Third-variable distortionConfounding bias
Time-related artifactsLead-time bias
Dissemination distortionReporting bias, Publication bias
Mitigated by blindingObserver bias, Information bias
Mitigated by randomizationConfounding bias, Selection bias
Threatens external validitySampling bias, Selection bias, Attrition bias

Self-Check Questions

  1. A case-control study finds that mothers of children with autism report higher pesticide exposure than mothers of healthy children. Which two biases could explain this finding, and how would you distinguish between them?

  2. Researchers notice that participants who drop out of a weight-loss trial had higher baseline BMIs than completers. What type of bias does this represent, and how might it affect the study's conclusions?

  3. Compare and contrast confounding bias and information bias: at what stage of research does each occur, and which can be corrected during statistical analysis?

  4. A new cancer screening test shows 5-year survival rates of 85% compared to 60% for unscreened patients. A biostatistician argues this doesn't prove the screening saves lives. What bias is she concerned about, and what alternative measure would provide stronger evidence?

  5. A meta-analysis of antidepressant trials shows that published studies report larger effect sizes than unpublished FDA submissions. Which bias does this demonstrate, and what graphical tool could have detected it?