📊Experimental Design

Threats to Internal Validity

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Internal validity is the foundation of experimental research—it answers the fundamental question: Did your independent variable actually cause the change in your dependent variable, or was it something else? When you're designing experiments or evaluating research on the AP exam, you're being tested on your ability to identify what could go wrong and why it matters. These threats represent the alternative explanations that can undermine even well-intentioned studies.

The threats you'll learn here fall into distinct categories: some involve changes over time, others stem from measurement problems, and still others arise from group composition issues or participant behavior. Don't just memorize a list of terms—understand what each threat reveals about the relationship between cause and effect. When an FRQ asks you to "identify a potential confound," you need to know which threat applies and why it would compromise the study's conclusions.

Time-Based Threats

These threats emerge because experiments unfold over time, and things change—both inside and outside the study. The longer your study runs, the more vulnerable it becomes to these confounds.

History

External events occurring during the study—anything from news events to weather changes can influence participants' responses independent of your treatment
Confounding variable risk increases when the event systematically affects one condition more than another (e.g., a school shooting occurring mid-study on media violence)
Control strategy: use shorter study durations or include a no-treatment control group experiencing the same time period

Maturation

Natural biological or psychological changes in participants over time—growth, fatigue, hunger, or cognitive development
Especially problematic in longitudinal studies where weeks or months pass between measurements
Key distinction from history: maturation is internal to participants, while history is external environmental change

Statistical Regression

Regression to the mean—extreme scores on a pretest tend to move toward the average on subsequent measurements, regardless of treatment
Misleading treatment effects occur when you select participants because they scored extremely high or low initially
Mathematical inevitability: this happens due to measurement error and natural variability, not because of any intervention

Compare: History vs. Maturation—both involve changes over time, but history refers to external events while maturation refers to internal participant changes. If an FRQ describes participants getting tired or hungry during a long experiment, that's maturation; if it mentions a fire drill interrupting the study, that's history.

Measurement and Testing Threats

These threats arise from how you measure your variables. Even perfect participants can yield invalid results if your measurement process introduces systematic error.

Testing

Practice effects—taking a pretest can improve performance on a posttest simply through familiarity with the format or questions
Sensitization occurs when the pretest alerts participants to what the study is measuring, changing their behavior
Solution: use Solomon four-group design or alternate test forms to detect and control for testing effects

Instrumentation

Changes in measurement tools or procedures between observations—different raters, recalibrated equipment, or modified scoring criteria
Observer drift happens when human coders gradually shift their standards over time
Consistency is essential: standardize protocols, train observers to criterion, and check inter-rater reliability throughout the study

Compare: Testing vs. Instrumentation—both involve measurement, but testing is about participant changes due to being measured, while instrumentation is about researcher/tool changes in how measurement occurs. A student improving because they remember test questions is testing; a teacher grading more leniently at the end of a long day is instrumentation.

Group Composition Threats

These threats stem from who is in your groups and whether those groups remain equivalent throughout the study. Random assignment is your primary defense here.

Selection Bias

Non-equivalent groups from the start—when participants aren't randomly assigned, pre-existing differences between groups become confounded with treatment effects
Systematic differences in motivation, ability, or demographics can explain outcomes that appear to be treatment effects
Random assignment (not random sampling) is the specific solution; it distributes individual differences evenly across conditions

Experimental Mortality (Attrition)

Differential dropout—when participants leave the study at different rates across conditions, especially if dropout is related to the treatment
Survivorship bias means your final sample no longer represents your original random assignment
Intent-to-treat analysis and tracking dropout reasons help researchers assess whether attrition threatens validity

Compare: Selection Bias vs. Attrition—selection bias creates non-equivalent groups at the start of a study, while attrition creates non-equivalent groups during the study. Both result in groups that differ in ways beyond the independent variable, but the timing and solution differ.

Participant Behavior Threats

These threats emerge from participants' awareness of being in an experiment and their reactions to their assigned condition. Human participants don't behave like passive objects—they interpret, react, and sometimes rebel.

Diffusion of Treatments

Information leakage between groups—participants in different conditions talk to each other and share what they're experiencing
Treatment contamination occurs when control group members adopt experimental group behaviors (or vice versa)
Physical separation of groups and clear instructions about confidentiality help prevent diffusion

Compensatory Rivalry

"John Henry effect"—control group participants work harder than normal to prove they can match the treatment group
Artificial inflation of control group performance makes the treatment appear less effective than it actually is
Awareness of disadvantage motivates competitive behavior that wouldn't occur outside the experimental context

Demoralization of Control Group

Resentful demoralization—control participants feel cheated and give up, performing worse than they normally would
Opposite of compensatory rivalry: instead of trying harder, participants disengage entirely
Ethical communication about study purpose and eventual access to treatment can reduce demoralization

Compare: Compensatory Rivalry vs. Demoralization—both involve control group reactions to knowing they're in the control condition, but they push results in opposite directions. Rivalry inflates control performance (making treatment look worse); demoralization deflates it (making treatment look better). Both are threats because they reflect reaction to group assignment, not true baseline behavior.

Quick Reference Table

Concept Category	Threats	Key Control Strategy
Time-based changes	History, Maturation, Statistical Regression	Control groups, shorter duration, avoid extreme-score selection
Measurement problems	Testing, Instrumentation	Alternate forms, standardized protocols, reliability checks
Group composition	Selection Bias, Attrition	Random assignment, track dropout, intent-to-treat analysis
Participant reactions	Diffusion, Compensatory Rivalry, Demoralization	Blind participants, separate groups, ethical communication
External events	History	Control group experiencing same time period
Internal participant changes	Maturation	Age-matched controls, shorter studies
Extreme score artifacts	Statistical Regression	Avoid selecting based on extreme scores

Self-Check Questions

A researcher selects students who scored in the bottom 10% on a math pretest for a tutoring intervention. Their posttest scores improve significantly. Which threat to internal validity should the researcher consider before concluding the tutoring worked?
Compare and contrast history and maturation as threats to internal validity. What question would you ask about a confounding event to determine which threat applies?
In a study comparing two teaching methods, students in the control classroom start using strategies they heard about from friends in the experimental classroom. Which two threats to internal validity might this represent, and how would you distinguish between them?
A weight-loss study finds that the treatment group lost significantly more weight than the control group, but 40% of treatment participants dropped out (compared to 5% of controls). Why might attrition make these results difficult to interpret?
An FRQ describes a study where control group participants, aware they're not receiving the new therapy, either (a) try extra hard to improve on their own or (b) become discouraged and stop trying. Identify both threats and explain why they would bias results in opposite directions.

📊Experimental Design

Threats to Internal Validity

Why This Matters

Time-Based Threats

History

Maturation

Statistical Regression

Measurement and Testing Threats

Testing

Instrumentation

Group Composition Threats

Selection Bias

Experimental Mortality (Attrition)

Participant Behavior Threats

Diffusion of Treatments

Compensatory Rivalry

Demoralization of Control Group

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes