upgrade
upgrade

🫁Intro to Biostatistics

Data Collection Methods

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

In biostatistics, the method you choose to collect data fundamentally shapes what conclusions you can draw—and how confident you can be in those conclusions. You're being tested on your ability to distinguish between methods that can establish causation versus those that only reveal association, and to recognize when each approach is appropriate given practical and ethical constraints. Understanding these distinctions is essential for evaluating research validity and designing your own studies.

The methods covered here demonstrate core principles like randomization, control, bias reduction, and generalizability. On exams, you'll need to identify which method fits a given research scenario, explain why certain designs are stronger for specific questions, and recognize the trade-offs inherent in each approach. Don't just memorize definitions—know what makes each method powerful and where its limitations lie.


Experimental Methods: Establishing Causation

The gold standard for determining cause-and-effect relationships requires researcher control over who receives an intervention. By manipulating the independent variable and holding other factors constant, experiments isolate the effect of interest.

Experiments

  • Manipulation of variables allows researchers to test cause-and-effect hypotheses directly—the only design that truly establishes causation
  • Random assignment to treatment and control groups distributes confounding variables evenly, reducing selection bias and strengthening internal validity
  • Control groups provide a baseline for comparison, making it possible to attribute observed differences to the intervention rather than external factors

Observational Approaches: Studying What Exists

When randomization is unethical or impractical, researchers must observe subjects without intervention. These methods sacrifice causal inference for real-world applicability and ethical feasibility.

Observational Studies

  • No manipulation of variables—researchers record exposures and outcomes as they naturally occur, making these studies essential when experiments would be unethical
  • Confounding variables remain a major threat since groups may differ systematically in ways beyond the exposure of interest
  • Cohort and case-control designs are subtypes that track exposures forward or backward in time, each with distinct strengths for different research questions

Longitudinal Studies

  • Repeated measurements on the same subjects over time allow researchers to track individual change trajectories and developmental patterns
  • Temporal sequence can be established—knowing that exposure preceded outcome strengthens causal arguments even without randomization
  • Attrition bias poses a significant threat as participants drop out over months or years, potentially skewing results if dropout is related to the outcome

Cross-Sectional Studies

  • Single time point data collection provides a snapshot of a population, making these studies faster and cheaper than longitudinal designs
  • Prevalence of conditions and characteristics can be estimated, useful for public health planning and hypothesis generation
  • Cannot establish causation because exposure and outcome are measured simultaneously—you can't determine which came first

Compare: Longitudinal vs. Cross-sectional studies—both are observational, but longitudinal tracks change over time while cross-sectional captures a single moment. If an FRQ asks about studying disease progression, longitudinal is your answer; for estimating current disease burden, choose cross-sectional.


Survey-Based Methods: Gathering Self-Reported Data

When researchers need information about attitudes, behaviors, or experiences, they must ask participants directly. The structure and format of questions significantly influence data quality and the types of analysis possible.

Surveys

  • Structured questionnaires yield quantitative data that can be statistically analyzed across large samples—ideal for measuring prevalence and associations
  • Administration modes (online, phone, in-person) each introduce different response biases and affect who participates
  • Sampling strategy determines generalizability—a well-designed survey of 1,000 people can represent millions if the sample is truly random

Interviews

  • Qualitative depth allows exploration of complex topics, capturing nuance and context that closed-ended questions miss
  • Flexibility in format—structured interviews standardize questions while unstructured interviews follow participant responses, trading reliability for richness
  • Interviewer effects can introduce bias through leading questions, tone, or participant desire to give socially acceptable answers

Focus Groups

  • Group dynamics generate data through participant interaction—ideas build on each other in ways individual interviews cannot capture
  • Moderator skill is critical for managing dominant personalities and encouraging quieter participants to contribute
  • Not generalizable to broader populations due to small, non-random samples, but excellent for exploratory research and hypothesis generation

Compare: Surveys vs. Interviews—surveys prioritize breadth and quantification across many respondents, while interviews prioritize depth with fewer participants. Choose surveys when you need statistical power; choose interviews when you need to understand why people think or behave a certain way.


Secondary and Specialized Approaches

Not all research requires collecting new data. Leveraging existing information or focusing intensively on specific cases can answer questions efficiently or reveal insights large studies miss.

Secondary Data Analysis

  • Existing datasets (hospital records, census data, prior studies) can be reanalyzed for new research questions without the cost of primary collection
  • Limited control over how variables were measured—you're constrained by decisions the original researchers made
  • Replication and validation of findings becomes possible when multiple researchers analyze the same data independently

Case Studies

  • Intensive analysis of one or few cases provides rich contextual detail impossible in large-sample studies
  • Hypothesis generation is the primary strength—unusual cases can reveal mechanisms or patterns worth testing in larger populations
  • Low external validity means findings may not generalize, but case studies excel at documenting rare conditions or exploring complex phenomena

Compare: Secondary data analysis vs. Primary data collection—secondary analysis saves time and money but limits you to existing variables, while primary collection lets you measure exactly what you need. For exam questions about resource constraints, secondary analysis is often the practical choice.


Sampling: The Foundation of Valid Inference

How you select participants determines whether your findings apply beyond your sample. Probability sampling allows statistical inference to populations; non-probability sampling does not.

Sampling Techniques

  • Probability sampling (simple random, stratified, cluster) gives every population member a known chance of selection, enabling generalization with quantifiable uncertainty
  • Non-probability sampling (convenience, purposive, snowball) is easier and cheaper but introduces selection bias that limits external validity
  • Sample size and representativeness jointly determine precision—a large biased sample is worse than a smaller representative one

Compare: Probability vs. Non-probability sampling—both select subsets from populations, but only probability sampling supports statistical inference. If an FRQ asks about generalizing to a population, the answer must involve random selection.


Quick Reference Table

ConceptBest Examples
Establishes causationExperiments (with randomization)
Tracks change over timeLongitudinal studies
Snapshot of populationCross-sectional studies
Quantitative self-reportSurveys
Qualitative depthInterviews, Case studies
Group interaction dataFocus groups
Uses existing dataSecondary data analysis
Enables generalizationProbability sampling techniques

Self-Check Questions

  1. A researcher wants to determine whether a new drug lowers blood pressure. Which method would establish causation, and what two design features are essential?

  2. Compare longitudinal and cross-sectional studies: what can longitudinal studies reveal that cross-sectional studies cannot, and why?

  3. A public health team needs to estimate the current prevalence of diabetes in a city quickly and affordably. Which study design should they use, and what is its main limitation?

  4. You're reviewing a study that used convenience sampling from a university campus to draw conclusions about all adults in the country. What type of validity is threatened, and why?

  5. When would a researcher choose interviews over surveys, and what trade-off does this choice involve?