upgrade
upgrade

🤖AI Ethics

AI Bias Types

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

When you're tested on AI ethics, you're not just being asked to define bias—you're being asked to demonstrate that you understand where bias enters AI systems, why it persists, and how different bias types compound to create unfair outcomes. Exam questions frequently require you to trace a discriminatory AI decision back to its root cause, whether that's flawed training data, problematic algorithm design, or human overreliance on automated outputs. Understanding the mechanism behind each bias type is what separates surface-level memorization from genuine comprehension.

These bias types don't exist in isolation. A single AI system can exhibit historical bias embedded in its training data, algorithmic bias in how it weights features, and automation bias in how humans interpret its outputs. The most challenging exam questions—especially FRQs—will ask you to identify multiple bias types operating simultaneously and explain their interaction. Don't just memorize definitions; know what stage of the AI pipeline each bias affects and what real-world harms it produces.


Biases Rooted in Training Data

The foundation of any AI system is the data it learns from. When that data reflects historical inequalities, excludes certain populations, or captures the world inaccurately, the resulting model inherits those flaws as features, not bugs.

Historical Bias

  • Reflects past prejudices and inequalities embedded in data collected during discriminatory eras—even "accurate" historical data can perpetuate harm
  • Amplification effect means AI systems don't just reproduce historical bias; they can intensify it through feedback loops
  • Challenges remediation efforts because removing historical bias requires active intervention, not just collecting more data

Representation Bias

  • Underrepresentation of marginalized groups leads to AI systems that perform poorly for those populations—facial recognition accuracy gaps are a prime example
  • Misrepresentation occurs when groups are included but characterized through stereotypical or limited data points
  • Demands diverse datasets as a minimum ethical standard, though diversity alone doesn't guarantee fairness

Data Bias

  • Unrepresentative or skewed training data produces models that generalize poorly to real-world populations
  • Societal biases become encoded when data reflects existing discrimination in hiring, lending, or policing practices
  • Undermines reliability of AI predictions across all downstream applications

Compare: Historical bias vs. Representation bias—both involve problematic training data, but historical bias stems from when data was collected (reflecting past discrimination), while representation bias concerns who is included or excluded. An FRQ might ask you to identify which bias type explains why an AI performs differently across demographic groups.


Biases in Data Collection Methods

Even with good intentions, how data is gathered can introduce systematic errors. These biases emerge before the algorithm ever sees the data.

Sampling Bias

  • Non-representative samples occur when training data doesn't reflect the true population distribution
  • Overgeneralization risk means conclusions drawn from biased samples may not apply to excluded groups
  • Common in demographic studies where convenience sampling or accessibility issues skew who gets included

Selection Bias

  • Systematic exclusion of certain individuals or groups from data collection—often invisible to researchers
  • Distorts population-level findings because missing data isn't random; it reflects structural barriers
  • Impacts fairness when AI applications make decisions about groups who weren't represented in training

Measurement Bias

  • Flawed collection tools or methods produce inaccurate data that AI systems treat as ground truth
  • Proxy variables can introduce bias when direct measurement is impossible—using zip codes as proxies for race, for example
  • Requires rigorous validation through testing across diverse conditions and populations

Compare: Sampling bias vs. Selection bias—sampling bias results from how a sample is drawn (methodology), while selection bias results from who gets systematically excluded (often due to structural factors). Both produce non-representative data, but the causes and solutions differ.


Biases in Algorithm Design and Outputs

Even with perfect data, the choices made in building and deploying algorithms can introduce or amplify unfairness. The algorithm itself is a site of ethical decision-making.

Algorithmic Bias

  • Systematically prejudiced results emerge from flawed assumptions in model architecture, feature selection, or optimization targets
  • Design choices matter because decisions about which variables to include and how to weight them encode values
  • High-stakes applications in hiring, lending, and criminal justice make algorithmic bias a civil rights concern

Reporting Bias

  • Selective reporting of results based on desired outcomes distorts understanding of AI system performance
  • Publication bias means failed or problematic AI applications often go unreported
  • Undermines accountability by preventing accurate assessment of AI harms and limitations

Compare: Algorithmic bias vs. Data bias—algorithmic bias originates in the model's design and logic, while data bias originates in the training information. A biased algorithm can produce unfair outcomes even with representative data, and vice versa. If an FRQ asks about interventions, specify whether you're addressing the algorithm, the data, or both.


Human-AI Interaction Biases

These biases emerge not from the AI system itself, but from how humans build, interpret, and rely on automated systems. They highlight the irreducibly human dimensions of AI ethics.

Confirmation Bias

  • Favoring information that confirms existing beliefs influences which data gets collected and how results are interpreted
  • Affects development when AI teams unconsciously design systems that validate their assumptions
  • Shapes interpretation of AI outputs, leading users to accept results that match expectations while dismissing contradictory evidence

Automation Bias

  • Over-reliance on automated systems leads humans to defer to AI even when contradictory information is available
  • Critical in high-stakes domains like healthcare diagnostics and criminal sentencing, where blind trust can cause serious harm
  • Underscores need for human oversight and training that encourages appropriate skepticism toward AI recommendations

Compare: Confirmation bias vs. Automation bias—confirmation bias affects how humans build and interpret AI systems, while automation bias affects how humans defer to AI outputs. Both involve cognitive shortcuts, but confirmation bias operates throughout development while automation bias operates at the point of deployment and use.


Quick Reference Table

ConceptBest Examples
Training data problemsHistorical bias, Representation bias, Data bias
Data collection flawsSampling bias, Selection bias, Measurement bias
Algorithm design issuesAlgorithmic bias, Reporting bias
Human-AI interactionConfirmation bias, Automation bias
Perpetuates past discriminationHistorical bias, Data bias
Excludes or underrepresents groupsRepresentation bias, Sampling bias, Selection bias
Requires human oversight solutionsAutomation bias, Confirmation bias
Affects high-stakes decisionsAlgorithmic bias, Automation bias, Historical bias

Self-Check Questions

  1. A facial recognition system performs well on light-skinned faces but poorly on dark-skinned faces because the training dataset contained mostly light-skinned individuals. Which two bias types best explain this outcome, and how do they differ?

  2. An AI hiring tool was trained on a company's historical hiring decisions, which favored male candidates. A recruiter notices the tool ranks women lower but approves its recommendations anyway. Identify the bias types present at each stage of this scenario.

  3. Compare and contrast sampling bias and selection bias. How might each produce a non-representative training dataset, and what different interventions would address each?

  4. A healthcare AI trained on data from urban hospitals makes inaccurate predictions for rural patients. Is this primarily a measurement bias, representation bias, or algorithmic bias problem? Defend your answer.

  5. If an FRQ asks you to explain how a single AI system can exhibit multiple bias types simultaneously, which three bias types would you choose to demonstrate the interaction between data, algorithm, and human factors?