upgrade
upgrade

📊Business Intelligence

Data Quality Dimensions

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

In Business Intelligence, your dashboards, reports, and predictive models are only as good as the data feeding them. Data quality dimensions give you a framework for evaluating whether your data is actually fit for purpose—and when exam questions ask you to diagnose why a BI initiative failed or recommend improvements, these dimensions are your diagnostic toolkit. You're being tested on your ability to identify which dimension is compromised in a given scenario and explain the downstream business impact.

Think of data quality dimensions as falling into three categories: intrinsic quality (is the data itself correct?), contextual quality (is it useful for this specific purpose?), and accessibility quality (can stakeholders actually use it?). Don't just memorize definitions—know how each dimension connects to data governance, ETL processes, and decision-making reliability. When you see a case study about conflicting reports or missed business opportunities, your first instinct should be to ask: which quality dimension broke down?


Intrinsic Quality: Is the Data Itself Correct?

These dimensions evaluate whether data accurately represents reality, independent of how it's being used. Intrinsic quality failures corrupt everything downstream—no amount of sophisticated analytics can fix fundamentally flawed inputs.

Accuracy

  • Measures how closely data values match real-world truth—the foundation of all trustworthy analytics
  • Verification methods include cross-referencing against authoritative sources, validation rules, and audit sampling
  • Business impact is severe: inaccurate customer data leads to failed deliveries, wrong pricing, and compliance violations

Validity

  • Confirms data conforms to defined formats, ranges, and business rules—distinct from accuracy because data can be valid but still wrong
  • Examples include date formats (MM/DD/YYYY vs. DD/MM/YYYY), acceptable value ranges, and required field constraints
  • Caught during ETL through validation checks; invalid records typically get rejected or flagged for review

Precision

  • Indicates the level of granularity and exactness in data values—think decimal places, measurement units, or category specificity
  • Context-dependent: financial transactions need penny-level precision; customer satisfaction scores may not
  • Over-precision can actually harm analysis by introducing noise or false confidence in measurements

Compare: Accuracy vs. Validity—both assess "correctness," but accuracy asks "is this the right value?" while validity asks "is this value properly formatted?" A birthdate of 02/30/1990 is invalid (impossible date), while 02/28/1991 for someone born 02/28/1990 is inaccurate (wrong year). FRQs love this distinction.


Structural Quality: Is the Data Internally Sound?

These dimensions ensure data maintains its quality across systems, time, and transformations. Structural failures often emerge during data integration when multiple sources collide.

Consistency

  • Ensures uniform representation across datasets and systems—same customer shouldn't appear as "IBM" in one system and "International Business Machines" in another
  • Master data management (MDM) is the primary solution for maintaining consistency across enterprise systems
  • Reconciliation reports help identify inconsistencies before they contaminate downstream analytics

Integrity

  • Refers to data remaining accurate and unaltered throughout its lifecycle—from creation through storage, transfer, and archival
  • Referential integrity ensures relationships between tables remain valid (foreign keys point to existing records)
  • Compromised by system failures, unauthorized modifications, or broken ETL pipelines; detected through checksums and audit trails

Reliability

  • Measures the dependability of data sources and consistency of values over time—can you trust this source repeatedly?
  • Assessed through source reputation, historical accuracy rates, and documentation of collection methods
  • Unreliable sources should be flagged in data catalogs so analysts can weight their conclusions appropriately

Compare: Consistency vs. Integrity—consistency is about uniformity across systems (horizontal alignment), while integrity is about preservation over time (vertical alignment). A database could have perfect integrity but still be inconsistent with other systems using different naming conventions.


Contextual Quality: Is the Data Useful for This Purpose?

These dimensions evaluate whether data serves the specific business need at hand. Data that's perfect for one use case may be worthless for another.

Completeness

  • Measures whether all required data elements are present—null values, missing records, and partial entries all reduce completeness
  • Acceptable thresholds vary by use case: 95% completeness might suffice for trend analysis but fail for regulatory reporting
  • Root causes include optional fields, system integration gaps, and data entry abandonment

Timeliness

  • Assesses whether data is current enough for the decision at hand—real-time trading needs second-level freshness; annual planning can use month-old data
  • Data latency describes the gap between when an event occurs and when it appears in your BI system
  • Batch vs. streaming architectures represent fundamental design choices driven by timeliness requirements

Relevance

  • Evaluates whether data actually applies to the business question being asked—collecting everything isn't a strategy
  • Irrelevant data increases storage costs, slows queries, and distracts analysts from meaningful patterns
  • Data minimization principles (especially under GDPR) make relevance a compliance concern, not just an efficiency one

Compare: Completeness vs. Relevance—these create tension in data strategy. Completeness pushes toward "capture everything," while relevance argues "capture only what matters." The best BI architectures balance both by defining clear data requirements before collection begins.


Accessibility Quality: Can Stakeholders Use the Data?

This dimension determines whether quality data actually reaches the people who need it. The best data in the world is worthless if decision-makers can't access it.

Accessibility

  • Measures how easily authorized users can obtain and utilize data—includes technical access, format usability, and discoverability
  • Barriers include siloed systems, inadequate permissions, poor documentation, and lack of self-service tools
  • Data democratization initiatives aim to improve accessibility while maintaining appropriate governance controls

Compare: Accessibility vs. Timeliness—both affect whether data reaches users when needed, but timeliness is about data freshness while accessibility is about delivery mechanisms. Real-time data that's locked in a system only IT can query fails on accessibility, not timeliness.


Quick Reference Table

ConceptBest Examples
Intrinsic QualityAccuracy, Validity, Precision
Structural QualityConsistency, Integrity, Reliability
Contextual QualityCompleteness, Timeliness, Relevance
Accessibility QualityAccessibility
ETL Validation FocusValidity, Consistency, Completeness
Governance PriorityIntegrity, Reliability, Accuracy
User-Facing ConcernsTimeliness, Accessibility, Relevance
Compliance-RelatedAccuracy, Integrity, Relevance

Self-Check Questions

  1. A company's CRM shows a customer's address as "123 Main St" while the billing system shows "123 Main Street, Suite 100." Which two dimensions are potentially compromised, and how would you distinguish between them?

  2. Your sales dashboard shows last month's figures, but executives need to respond to a competitor's price change announced yesterday. Which dimension has failed, and what architectural change would address it?

  3. Compare and contrast accuracy and precision using an example of customer age data. How might data be precise but inaccurate, or accurate but imprecise?

  4. An FRQ describes a merger where two companies' product databases use different category taxonomies. Which dimensions are at risk, and what data management strategy would you recommend?

  5. A data analyst complains that the information they need exists but is trapped in a legacy system requiring IT tickets to extract. Which dimension is failing, and how does this differ from a timeliness problem?