upgrade
upgrade

📊Experimental Design

Threats to Internal Validity

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Internal validity is the foundation of experimental research—it answers the fundamental question: Did your independent variable actually cause the change in your dependent variable, or was it something else? When you're designing experiments or evaluating research on the AP exam, you're being tested on your ability to identify what could go wrong and why it matters. These threats represent the alternative explanations that can undermine even well-intentioned studies.

The threats you'll learn here fall into distinct categories: some involve changes over time, others stem from measurement problems, and still others arise from group composition issues or participant behavior. Don't just memorize a list of terms—understand what each threat reveals about the relationship between cause and effect. When an FRQ asks you to "identify a potential confound," you need to know which threat applies and why it would compromise the study's conclusions.


Time-Based Threats

These threats emerge because experiments unfold over time, and things change—both inside and outside the study. The longer your study runs, the more vulnerable it becomes to these confounds.

History

  • External events occurring during the study—anything from news events to weather changes can influence participants' responses independent of your treatment
  • Confounding variable risk increases when the event systematically affects one condition more than another (e.g., a school shooting occurring mid-study on media violence)
  • Control strategy: use shorter study durations or include a no-treatment control group experiencing the same time period

Maturation

  • Natural biological or psychological changes in participants over time—growth, fatigue, hunger, or cognitive development
  • Especially problematic in longitudinal studies where weeks or months pass between measurements
  • Key distinction from history: maturation is internal to participants, while history is external environmental change

Statistical Regression

  • Regression to the mean—extreme scores on a pretest tend to move toward the average on subsequent measurements, regardless of treatment
  • Misleading treatment effects occur when you select participants because they scored extremely high or low initially
  • Mathematical inevitability: this happens due to measurement error and natural variability, not because of any intervention

Compare: History vs. Maturation—both involve changes over time, but history refers to external events while maturation refers to internal participant changes. If an FRQ describes participants getting tired or hungry during a long experiment, that's maturation; if it mentions a fire drill interrupting the study, that's history.


Measurement and Testing Threats

These threats arise from how you measure your variables. Even perfect participants can yield invalid results if your measurement process introduces systematic error.

Testing

  • Practice effects—taking a pretest can improve performance on a posttest simply through familiarity with the format or questions
  • Sensitization occurs when the pretest alerts participants to what the study is measuring, changing their behavior
  • Solution: use Solomon four-group design or alternate test forms to detect and control for testing effects

Instrumentation

  • Changes in measurement tools or procedures between observations—different raters, recalibrated equipment, or modified scoring criteria
  • Observer drift happens when human coders gradually shift their standards over time
  • Consistency is essential: standardize protocols, train observers to criterion, and check inter-rater reliability throughout the study

Compare: Testing vs. Instrumentation—both involve measurement, but testing is about participant changes due to being measured, while instrumentation is about researcher/tool changes in how measurement occurs. A student improving because they remember test questions is testing; a teacher grading more leniently at the end of a long day is instrumentation.


Group Composition Threats

These threats stem from who is in your groups and whether those groups remain equivalent throughout the study. Random assignment is your primary defense here.

Selection Bias

  • Non-equivalent groups from the start—when participants aren't randomly assigned, pre-existing differences between groups become confounded with treatment effects
  • Systematic differences in motivation, ability, or demographics can explain outcomes that appear to be treatment effects
  • Random assignment (not random sampling) is the specific solution; it distributes individual differences evenly across conditions

Experimental Mortality (Attrition)

  • Differential dropout—when participants leave the study at different rates across conditions, especially if dropout is related to the treatment
  • Survivorship bias means your final sample no longer represents your original random assignment
  • Intent-to-treat analysis and tracking dropout reasons help researchers assess whether attrition threatens validity

Compare: Selection Bias vs. Attrition—selection bias creates non-equivalent groups at the start of a study, while attrition creates non-equivalent groups during the study. Both result in groups that differ in ways beyond the independent variable, but the timing and solution differ.


Participant Behavior Threats

These threats emerge from participants' awareness of being in an experiment and their reactions to their assigned condition. Human participants don't behave like passive objects—they interpret, react, and sometimes rebel.

Diffusion of Treatments

  • Information leakage between groups—participants in different conditions talk to each other and share what they're experiencing
  • Treatment contamination occurs when control group members adopt experimental group behaviors (or vice versa)
  • Physical separation of groups and clear instructions about confidentiality help prevent diffusion

Compensatory Rivalry

  • "John Henry effect"—control group participants work harder than normal to prove they can match the treatment group
  • Artificial inflation of control group performance makes the treatment appear less effective than it actually is
  • Awareness of disadvantage motivates competitive behavior that wouldn't occur outside the experimental context

Demoralization of Control Group

  • Resentful demoralization—control participants feel cheated and give up, performing worse than they normally would
  • Opposite of compensatory rivalry: instead of trying harder, participants disengage entirely
  • Ethical communication about study purpose and eventual access to treatment can reduce demoralization

Compare: Compensatory Rivalry vs. Demoralization—both involve control group reactions to knowing they're in the control condition, but they push results in opposite directions. Rivalry inflates control performance (making treatment look worse); demoralization deflates it (making treatment look better). Both are threats because they reflect reaction to group assignment, not true baseline behavior.


Quick Reference Table

Concept CategoryThreatsKey Control Strategy
Time-based changesHistory, Maturation, Statistical RegressionControl groups, shorter duration, avoid extreme-score selection
Measurement problemsTesting, InstrumentationAlternate forms, standardized protocols, reliability checks
Group compositionSelection Bias, AttritionRandom assignment, track dropout, intent-to-treat analysis
Participant reactionsDiffusion, Compensatory Rivalry, DemoralizationBlind participants, separate groups, ethical communication
External eventsHistoryControl group experiencing same time period
Internal participant changesMaturationAge-matched controls, shorter studies
Extreme score artifactsStatistical RegressionAvoid selecting based on extreme scores

Self-Check Questions

  1. A researcher selects students who scored in the bottom 10% on a math pretest for a tutoring intervention. Their posttest scores improve significantly. Which threat to internal validity should the researcher consider before concluding the tutoring worked?

  2. Compare and contrast history and maturation as threats to internal validity. What question would you ask about a confounding event to determine which threat applies?

  3. In a study comparing two teaching methods, students in the control classroom start using strategies they heard about from friends in the experimental classroom. Which two threats to internal validity might this represent, and how would you distinguish between them?

  4. A weight-loss study finds that the treatment group lost significantly more weight than the control group, but 40% of treatment participants dropped out (compared to 5% of controls). Why might attrition make these results difficult to interpret?

  5. An FRQ describes a study where control group participants, aware they're not receiving the new therapy, either (a) try extra hard to improve on their own or (b) become discouraged and stop trying. Identify both threats and explain why they would bias results in opposite directions.