Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Ethics isn't a side topic in data science—it's woven into every stage of the data lifecycle, from collection to deployment. You're being tested on your ability to recognize how technical choices create real-world consequences, whether that's an algorithm denying someone a loan, a dataset exposing private information, or a model perpetuating historical discrimination. Understanding these principles means understanding that data science operates within a social context, not a vacuum.
The core concepts here—fairness, accountability, transparency, and privacy—form the backbone of responsible data practice. Don't just memorize definitions; know how these principles interact and conflict. When does maximizing accuracy compromise fairness? When does transparency threaten privacy? These tensions are exactly what exam questions probe. Master the why behind each consideration, and you'll be ready for any scenario they throw at you.
Data science begins with people—their information, their consent, their trust. These considerations establish the foundational obligations data scientists have to the individuals whose data powers their work. The principle here is autonomy: individuals should control what happens to their personal information.
Compare: Data Privacy vs. Informed Consent—both protect individuals, but privacy focuses on what happens to data after collection while consent addresses the moment of collection itself. An FRQ might ask you to identify which principle is violated when data is used for an undisclosed purpose (hint: both).
Algorithms learn from data, and data reflects history—including its inequities. These considerations address how bias enters systems and what fairness means mathematically and socially.
Compare: Bias in Algorithms vs. Social Impact—bias is a technical problem (how the model behaves), while social impact is a systemic outcome (what happens in the world). A model can be statistically fair by one metric but still cause harm at scale. Exam questions often test whether you can distinguish between fixing the model and addressing broader consequences.
Trust requires that stakeholders understand and can verify how data systems work. These considerations focus on making the black box transparent and holding organizations responsible for outcomes.
Compare: Transparency vs. Accountability—transparency is about understanding (can you see how decisions are made?), while accountability is about consequences (who answers when things go wrong?). A system can be transparent but lack accountability if no one is responsible for acting on that information.
Data doesn't exist in a legal vacuum. These considerations address who controls data, who profits from it, and how organizations structure ethical oversight.
Compare: Data Security vs. Data Privacy—security is a technical safeguard (preventing unauthorized access), while privacy is a normative principle (respecting boundaries even with authorized access). You can have perfect security but still violate privacy by using data inappropriately. Expect questions that test whether you can identify which principle applies to a given scenario.
| Concept | Best Examples |
|---|---|
| Individual Autonomy | Informed Consent, Data Privacy, Data Minimization |
| Algorithmic Fairness | Bias Detection, Fairness Metrics, Continuous Auditing |
| Transparency | Explainability, Model Cards, Documentation |
| Accountability | Audit Trails, Redress Mechanisms, Responsibility Chains |
| Legal Compliance | GDPR, CCPA, Data Retention Policies |
| Security | Encryption, Access Controls, Breach Response |
| Societal Responsibility | Impact Assessment, Feedback Loop Analysis, Stakeholder Input |
Which two ethical considerations both protect individuals but operate at different stages of the data lifecycle? Explain how they differ in focus.
A predictive policing algorithm is trained on historical arrest data and disproportionately flags minority neighborhoods. Identify which ethical principles are violated and explain the mechanism causing the harm.
Compare and contrast transparency and accountability: Can a system satisfy one principle but not the other? Provide an example.
An FRQ describes a company that collected email addresses for shipping notifications but later used them for marketing. Which specific ethical principles does this violate, and why?
A healthcare AI achieves high accuracy overall but performs significantly worse for elderly patients. Which fairness concept applies here, and what would responsible deployment require?