Bias in epidemiologic studies systematically distorts results, pushing estimates away from the truth. Selection bias, information bias, and confounding are the three main types you need to know, and each one warps your data in a different way. Recognizing them is the first step toward designing better studies and reading published research with a critical eye.

Types of Epidemiologic Bias

Selection bias occurs when the people who end up in your study aren't representative of the population you're trying to learn about. This distorts the association between exposure and outcome. For example, if you recruit volunteers for a diet study, the people who sign up may already be more health-conscious than average, which skews your results.

Information bias happens when exposure, outcome, or other variables are measured or recorded inaccurately. This leads to misclassification of participants. Recall bias is a classic example: a patient diagnosed with cancer may think harder about past chemical exposures than a healthy control would, creating a systematic difference in how information is reported.

Confounding is different from the other two. It occurs when a third variable is associated with both the exposure and the outcome, creating a spurious association (or hiding a real one). A well-known example: studies of alcohol and lung cancer can be confounded by smoking, because people who drink more also tend to smoke more. The apparent link between alcohol and lung cancer may really be driven by smoking.

Confounding is sometimes grouped separately from "bias" because it reflects a real causal structure in the world rather than a flaw in measurement or sampling. Still, most epidemiology courses cover all three together because they all threaten the validity of your findings.

Sources of Selection Bias

Non-random sampling pulls in participants who are easy to reach rather than truly representative. Using university students for a psychology study, for instance, gives you a narrow slice of the population that may not generalize.
Loss to follow-up becomes a problem when participants who drop out differ systematically from those who stay. If patients with more severe symptoms are more likely to leave a cohort study, the remaining sample looks healthier than reality.
Self-selection means the people who choose to participate differ from those who don't. The healthy worker effect is a textbook example: employed people tend to be healthier than the general population, so occupational studies can underestimate disease risk if they only look at current workers.
Berkson's bias can occur in hospital-based case-control studies, where the controls (hospitalized for other reasons) may have different exposure patterns than the general population.

Types of epidemiologic bias, Frontiers | Differential Recall Bias, Intermediate Confounding, and Mediation Analysis in Life ...

Information Bias in Data Collection

Recall bias affects how well participants remember past exposures, and it often differs between cases and controls. Someone diagnosed with a disease is more motivated to search their memory for possible causes, while a healthy control may not think as carefully.
Observer bias creeps in when researchers who know a participant's disease or exposure status unconsciously collect or record data differently. An unblinded clinician might rate symptoms as more severe in a patient they know received a placebo.
Measurement error can be random (reducing precision) or systematic (consistently over- or underestimating a value). A poorly calibrated blood pressure cuff that reads 5 mmHg too high is systematic; normal variation from reading to reading is random.
Social desirability bias leads participants to underreport stigmatized behaviors (like heavy drinking) or overreport healthy ones (like exercise), regardless of their actual habits.

Strategies for Bias Minimization

Reducing selection bias:

Use random sampling from a well-defined source population.
Match cases and controls on key characteristics (age, sex, etc.) to improve comparability.
Oversample underrepresented groups to ensure adequate representation.
Track and report loss to follow-up so readers can judge whether it may have affected results.

Reducing information bias:

Develop standardized data collection protocols so every participant is assessed the same way.
Train interviewers and observers thoroughly, and check inter-rater reliability.
Blind researchers to participants' exposure or outcome status whenever possible.
Use validated measurement instruments rather than ad hoc tools.

Study design choices that help:

Prospective cohort studies collect exposure data before the outcome occurs, which eliminates recall bias.
Using multiple data sources (medical records, biomarkers, self-report) lets you cross-verify information and catch inconsistencies.

Statistical approaches:

Sensitivity analysis tests how much your results would change under different assumptions about bias, giving you a sense of how robust your conclusions are.
Propensity score matching pairs exposed and unexposed participants who have similar predicted probabilities of exposure, helping balance measured confounders.
Inverse probability weighting assigns weights to participants based on their probability of being selected or remaining in the study, compensating for differential selection or loss to follow-up.

2,589 studying →