AP exam review verified for 2027

AP Computer Science Principles Unit 2 Review: Data in AP Computer Science Principles

Review AP CSP Unit 2 to understand how computers represent everything as bits, how compression shrinks data without always losing quality, and how programs turn raw datasets into real knowledge. This unit covers binary numbers, lossless and lossy compression, metadata, data cleaning, and programmatic data processing.

Use the topic guides, key terms, and practice questions available here to work through each concept before your exam.

What is AP Computer Science Principles unit 2?

Every value a computer handles, from a single character to a full video file, is ultimately stored as bits. Unit 2 explains that foundational idea and then builds outward: how data gets compressed, how raw data becomes useful information, and how programs automate that extraction at scale.

Unit 2 is about how computers represent, compress, and process data. The core claim is that data and information facilitate the creation of knowledge, and the unit traces that path from individual bits all the way to programs that filter and visualize large datasets.

Binary is the lowest level

All digital data is stored as bits (0 or 1). Eight bits form a byte. Groups of bits represent numbers, characters, colors, and more through abstraction. Fixed-width bit representations create real limits, including overflow errors when a value exceeds the available range.

Compression trades size for fidelity

Lossless compression (like run-length encoding) guarantees exact reconstruction. Lossy compression (like JPEG or MP3) achieves greater size reduction but only approximates the original. Choosing between them depends on whether exact reconstruction is required.

Programs turn data into knowledge

Raw data becomes information when patterns and facts are extracted. Programs search, filter, transform, and combine datasets. Metadata organizes and describes data. Challenges like bias, incomplete data, and the need for cleaning apply to datasets of any size.

Data and information facilitate the creation of knowledge

This is the central claim of Big Idea 2. A spreadsheet of temperatures is data. The trend showing warming over a decade is information. The conclusion that a region is experiencing climate change is knowledge. Unit 2 traces every step of that progression, from the bit level up through programmatic analysis, and asks you to reason about the tools, limits, and responsibilities involved at each stage.

AP Computer Science Principles unit 2 topics

2.1

Binary Numbers

All data in a computer is stored as bits (0 or 1). Learn to convert between binary and decimal using place values (powers of 2), understand how abstraction groups bits into characters and colors, and explain why fixed-width representations cause overflow and roundoff errors.

open guide
2.2

Data Compression

Compression reduces the number of bits in stored or transmitted data. Lossless compression allows exact reconstruction; lossy compression achieves greater reduction but only approximates the original. Compare the two and choose the appropriate type for a given scenario.

open guide
2.3

Extracting Information from Data

Information is facts and patterns pulled from raw data. Metadata describes data without being the data itself. Correlation in a dataset does not prove causation. Data challenges include cleaning, bias, incomplete records, and the need to combine multiple sources.

open guide
2.4

Using Programs with Data

Programs search, filter, transform, combine, and visualize data to generate knowledge. Spreadsheets and other tools support iterative analysis. Clustering and classifying data reveal patterns that would be impractical to find manually.

open guide
guide

Big Idea 2 Overview: Data

AP CSP Big Idea 2 (Data) is 17-22% of the exam. Review binary numbers, lossless vs. lossy compression, metadata, and data processing, plus practice questions.

open guide
practice snapshot

Hardest AP Computer Principles unit 2 topics

This snapshot uses Fiveable practice activity to show where students tend to miss questions and which review moves are worth prioritizing first.

68%average MCQ accuracy

Across 11k multiple-choice practice attempts for this unit.

11kMCQ attempts

Practice activity included in this snapshot.

Hardest topics in unit 2

MCQ miss rate
2.1

Review Binary Numbers with attention to how the concept appears in AP-style source and evidence questions.

39%2,099 tries
2.3

Review Extracting Information from Data with attention to how the concept appears in AP-style source and evidence questions.

31%3,274 tries
2.2

Review Data Compression with attention to how the concept appears in AP-style source and evidence questions.

31%2,861 tries

Unit 2 review notes

2.1

Binary Numbers

Computing devices represent all data digitally using bits. A bit is a binary digit, either 0 or 1, and eight bits form a byte. To convert a decimal number to binary, repeatedly divide by 2 and track remainders. To convert binary to decimal, multiply each bit by its place value (a power of 2) and sum the results. For example, 1011 in binary equals 1x8 + 0x4 + 1x2 + 1x1 = 11 in decimal. The same sequence of bits can represent different types of data depending on context, which is why abstraction matters: bits are grouped to represent numbers, characters, and colors without the programmer needing to manage every 0 and 1 directly.

  • Bit: The smallest unit of data, either 0 or 1. Eight bits form one byte.
  • Positional notation: Each bit's value depends on its position; the rightmost position is 2^0, the next is 2^1, and so on.
  • Overflow error: Occurs when a mathematical result exceeds the range a fixed number of bits can represent, producing an incorrect value.
  • Abstraction: Grouping bits to represent higher-level concepts like characters or colors, hiding the underlying binary details.
  • Roundoff error: Occurs because real numbers cannot always be represented exactly in a fixed number of bits, producing approximations.
Convert 25 (decimal) to binary and then convert 11010 (binary) back to decimal. Explain what would happen if a program tried to add 1 to the largest value a 4-bit unsigned integer can hold.
FeatureDecimal (Base 10)Binary (Base 2)
Digits used0-90 and 1
Place valuesPowers of 10Powers of 2
Example: value 131x10 + 3x11x8 + 1x4 + 0x2 + 1x1
Used byHumansComputing devices
2.2

Data Compression

Data compression reduces the number of bits needed to store or transmit a file. The amount of reduction depends on how much redundancy exists in the original data and which algorithm is applied. Fewer bits does not automatically mean less information. Lossless compression guarantees that the original data can be reconstructed exactly, making it appropriate for text files or executable programs. Lossy compression achieves greater size reduction but only allows reconstruction of an approximation, making it suitable for images, audio, and video where small quality losses are acceptable. When exact reconstruction is required, lossless is the only valid choice.

  • Lossless compression: Reduces file size while allowing perfect reconstruction of the original data.
  • Lossy compression: Reduces file size more aggressively but only approximates the original; some data is permanently removed.
  • Redundancy: Repeated or predictable patterns in data that compression algorithms exploit to reduce size.
A hospital stores patient MRI scans digitally. Should it use lossless or lossy compression? Explain why the choice matters in this context.
FeatureLosslessLossy
ReconstructionExact originalApproximation only
Size reductionModerateGreater
Data permanently lostNoYes
Best forText, executables, medical imagesPhotos, audio, video
Example formatsPNG, ZIPJPEG, MP3
2.3

Extracting Information from Data

Information is the collection of facts and patterns extracted from data. Data alone does not equal information; processing is required to find trends, make connections, and address problems. Metadata is data about data, such as a photo's file size, creation date, or location tag. Metadata does not change the primary data when edited or deleted, but it helps organize and find information. A critical reasoning skill for this topic is distinguishing correlation from causation: two variables may move together in a dataset without one causing the other. Additional research is always needed to establish a causal relationship. Data challenges apply regardless of dataset size and include incomplete data, invalid data, non-uniform formatting, and the need to combine multiple sources. Cleaning data standardizes values without changing their meaning. Bias in data comes from the type or source of collection and cannot be fixed by simply gathering more data.

  • Metadata: Data about data, such as a file's creation date, size, or author, used to organize and manage information.
  • Correlation: A statistical relationship between two variables in a dataset; does not imply that one causes the other.
  • Cleaning data: Standardizing values in a dataset (fixing abbreviations, capitalization, formats) without changing their meaning.
  • Data bias: Systematic error introduced by the type or source of data collected; collecting more data does not eliminate it.
  • Scalability: The ability to process increasing amounts of data; large datasets may require parallel or distributed computing tools.
A researcher finds that cities with more ice cream sales also have higher drowning rates. Is this correlation or causation? What additional step is needed before drawing a conclusion?
ConceptDefinitionExample
DataRaw, unprocessed valuesA list of daily temperatures
InformationPatterns extracted from dataAverage temperature rising over 10 years
MetadataData about the dataFile size and creation date of a temperature log
CorrelationTwo variables move togetherIce cream sales and drowning rates both rise in summer
CausationOne variable directly causes anotherRequires additional controlled research to establish
2.4

Using Programs with Data

Programs automate the process of extracting information from data. Tools like spreadsheets help organize datasets and identify trends. Search tools efficiently locate specific records. Data filtering keeps only the records that meet a condition, such as keeping only students enrolled in a specific course. Transforming data applies an operation to every element, such as doubling every value in a list. Combining data from multiple sources, clustering similar records, and classifying data are all ways programs generate insight. Visualization tools like tables, charts, and diagrams communicate findings. The process is iterative: programmers repeatedly filter, clean, transform, and visualize until meaningful patterns emerge.

  • Data filtering: Selecting only the records that meet a specified condition, removing the rest from consideration.
  • Data mining: Using programs to discover patterns or knowledge in large datasets through techniques like clustering and classification.
  • Digital data: Information represented as discrete binary values that can be stored, processed, and transmitted by computing devices.
A program reads a dataset of student test scores and keeps only scores above 80, then calculates the average of those scores. Identify which data operations are being used and what type of insight the program is generating.
OperationWhat it doesExample
FilteringKeeps records matching a conditionKeep only rows where grade = 'A'
TransformingApplies an operation to every elementMultiply every price by 1.08 for tax
CombiningMerges data from multiple sourcesJoin student names with their test scores
ClassifyingGroups data into categoriesLabel emails as spam or not spam
VisualizingDisplays data as charts or tablesLine chart showing sales over 12 months

Practice AP Computer Science Principles unit 2 questions

Try AP-style multiple-choice questions and written prompts after you review the notes.

Example AP-style MCQs

open all practice
MCQ

AP-style practice question

Question

A hospital releases an anonymized dataset of patient records. A researcher combines this with a public voter registration database. What is the most likely unintended consequence of this data processing?

Patients may be re-identified, compromising their privacy despite the initial anonymization.

Patients may be re-identified, compromising their privacy despite the initial data compression.

Patients may be re-identified, compromising their privacy despite the initial data encryption.

Patients may be re-identified, compromising their privacy despite the initial data validation.

MCQ

AP-style practice question

Question

A temperature sensor stores the current reading in a variable tempBin as binary 00110101. A second sensor stores its reading in a variable tempDec as decimal 55. Which statement accurately compares the values stored in these variables?

tempBin is smaller because decimal 53 is less than 55

tempBin is larger because decimal 57 is greater than 55

tempBin is equal because decimal 55 is equal to 55

tempBin is smaller because decimal 51 is less than 55

Key terms

TermDefinition
BitsThe basic unit of digital data, either 0 or 1. All data in a computing device is ultimately represented as bits.
Digital DataInformation represented using discrete binary values (0s and 1s) that can be stored, processed, and transmitted by computing devices.
positional notationA number system where each digit's value depends on its position; in binary, each position represents a power of 2.
number baseThe foundation of a positional number system; binary uses base 2 (digits 0-1) and decimal uses base 10 (digits 0-9).
overflowAn error that occurs when a mathematical result exceeds the range a fixed number of bits can represent, producing an incorrect value.
Lossless data compressionA compression method that reduces file size while allowing perfect reconstruction of the original data.
Lossy data compressionA compression method that achieves greater size reduction by permanently removing some data, allowing only an approximation of the original to be reconstructed.
RedundancyRepeated or predictable patterns in data that compression algorithms exploit to reduce the number of bits needed.
MetadataData about data, such as a file's creation date, size, or author. Editing metadata does not change the primary data.
CorrelationA statistical relationship between two variables in a dataset; does not imply that one variable causes the other.
Cleaning DataStandardizing values in a dataset (fixing abbreviations, capitalization, or formats) to make data uniform without changing its meaning.
data biasSystematic error introduced by the type or source of data collected; cannot be eliminated by simply collecting more data.
Data FilteringSelecting only the records from a dataset that meet a specified condition and excluding the rest.
ScalabilityThe ability of a system to handle increasing amounts of data without losing performance; large datasets may require parallel or distributed tools.
Analog DataContinuous, real-world information (like sound or temperature) that must be sampled and converted into discrete binary values for digital storage.

Common unit 2 mistakes

Confusing correlation with causation

A dataset showing that two variables move together does not mean one causes the other. Always note that additional research is needed, and never state causation based on correlation alone.

Thinking fewer bits always means less information

Compression reduces bits but does not necessarily reduce information, especially with lossless methods. Lossy compression does remove some data, but the distinction matters for exam questions.

Applying lossy compression when exact reconstruction is required

For text files, executable programs, or medical records, lossy compression is inappropriate because the original must be recovered exactly. Choosing lossy in those contexts is a conceptual error.

Believing more data eliminates bias

Bias comes from the type or source of data collected. Collecting more data from the same biased source does not fix the problem; the source of the bias must be addressed.

Mixing up binary place values

The rightmost bit is 2^0 (value 1), not 2^1. A common error is shifting all place values one position, which produces an incorrect conversion. Always start counting positions from 0 on the right.

How this unit shows up on the AP exam

Scenario-based compression choice

Multiple-choice questions often describe a file type or use case and ask whether lossless or lossy compression is more appropriate. The key reasoning move is identifying whether exact reconstruction is required. Medical images, text files, and executables require lossless; photos, audio, and video typically tolerate lossy.

Binary conversion and overflow reasoning

Questions may give you a binary number to convert to decimal, ask you to compare two binary values, or describe a situation where a fixed number of bits causes an error. Practice the weighted-sum method and be ready to explain why overflow or roundoff errors occur in terms of bit-width limits.

Data analysis and correlation vs. causation

Questions in this unit often present a dataset or a described analysis and ask what conclusion can or cannot be drawn. A common task is recognizing that a correlation between two variables does not establish causation, and that combining data sources or cleaning data is necessary before drawing valid conclusions.

Final unit 2 review checklist

  • Convert between binary and decimalPractice converting positive integers in both directions using place values (powers of 2). Be able to compare and order binary numbers without converting them first.
  • Explain overflow and roundoff errorsDescribe what happens when a fixed number of bits cannot represent a value, and why real numbers stored in binary are sometimes approximations.
  • Compare lossless and lossy compressionKnow that lossless guarantees exact reconstruction while lossy only approximates. Be able to choose the correct type for a scenario, such as medical imaging versus streaming audio.
  • Distinguish data, information, and metadataData is raw values; information is patterns extracted from data; metadata is data about data. Know that editing metadata does not change the primary data.
  • Apply the correlation vs. causation distinctionRecognize that a correlation found in a dataset does not establish a causal relationship. Additional research is required to determine causation.
  • Identify data challengesBe ready to name and explain cleaning, incomplete data, invalid data, non-uniform formatting, combining sources, bias, and scalability as challenges that apply to any dataset size.
  • Trace programmatic data operationsIdentify filtering, transforming, combining, classifying, and visualizing operations in a described program and explain what insight each operation helps generate.

How to study unit 2

Start with binary numbers (Topic 2.1)Read the Topic 2.1 guide, then practice converting at least five decimal numbers to binary and five binary numbers to decimal. Write out the place values (2^0 through 2^7) before each conversion until the pattern is automatic. Then explain in your own words what overflow and roundoff errors are.
Work through data compression (Topic 2.2)Read the Topic 2.2 guide and study the lossless vs. lossy comparison table. For each scenario you encounter (a text document, a photo, a song, a medical scan), practice stating which compression type is appropriate and why. Focus on the word 'exact reconstruction' as the deciding factor.
Review extracting information and metadata (Topic 2.3)Read the Topic 2.3 guide. Practice distinguishing data from information from metadata using concrete examples. Write two example scenarios where correlation exists but causation does not. List the five data challenges (cleaning, incomplete, invalid, combining sources, bias) and give one example of each.
Connect programs to data operations (Topic 2.4)Read the Topic 2.4 guide. For a sample dataset scenario, identify which operations (filter, transform, combine, classify, visualize) a described program is performing. Practice explaining what insight each operation produces and why iterative processing matters.
Consolidate with practice questions and the score calculatorUse the 25+ available practice questions to test yourself across all four topics. After reviewing your results, use the AP score calculator to estimate your estimated score range and identify which topic areas need more focused review before the exam.

More ways to review

Topic study guides

Open the individual guides for Unit 2 when you want a closer review of one topic.

browse guides

FRQ practice

Practice free-response reasoning and compare your answer with scoring guidance.

practice FRQs

Cram archive videos

Watch past review streams filtered to Unit 2 when you want a video walkthrough.

open videos

Cheatsheets

Use unit cheatsheets for a quick visual review after you work through the notes.

open cheatsheets

Score calculator

Estimate your broader AP score goal after you review the course and exam format.

open calculator

Frequently Asked Questions

What topics are covered in AP CSP Unit 2?

AP CSP Unit 2: Data covers 4 topics: Binary Numbers (2.1), Data Compression (2.2), Extracting Information from Data (2.3), and Using Programs with Data (2.4). You'll learn how computers store information in bits, how compression reduces file sizes, and how to analyze datasets with programs to draw real conclusions. See the full breakdown at AP CSP Unit 2.

What's on the AP CSP Unit 2 progress check (MCQ and FRQ)?

The AP CSP Unit 2 progress check includes MCQ and FRQ parts that draw directly from all 4 unit topics: Binary Numbers, Data Compression, Extracting Information from Data, and Using Programs with Data. MCQ questions test your ability to interpret binary values and compression trade-offs. FRQ prompts typically ask you to analyze a dataset or explain how a program processes data to find patterns. Practice with questions matched to these topics at AP CSP Unit 2.

How do I practice AP CSP Unit 2 FRQs?

AP CSP Unit 2 FRQs most often come from Extracting Information from Data (2.3) and Using Programs with Data (2.4), asking you to interpret a dataset, identify a trend, or explain what a program does with data. To practice, work through prompts that give you a table or chart and ask you to draw a conclusion or describe a computational solution. Focus on writing clear, specific explanations, not vague ones. Find practice FRQ prompts for this unit at AP CSP Unit 2.

Where can I find AP CSP Unit 2 practice questions?

For AP CSP Unit 2 practice questions, including multiple-choice and practice test sets, the best starting point is AP CSP Unit 2. You'll find MCQ questions covering Binary Numbers, Data Compression, Extracting Information from Data, and Using Programs with Data, organized by topic so you can target the areas where you need the most work.

How should I study AP CSP Unit 2?

Start AP CSP Unit 2 by building a solid grip on Binary Numbers (2.1), since everything else in the unit depends on understanding how bits represent data. Then work through Data Compression (2.2) by comparing lossless vs. lossy examples you already know, like ZIP files vs. JPEGs. For topics 2.3 and 2.4, practice reading data tables and tracing through simple programs that process datasets. Write out your reasoning in full sentences, because FRQ graders reward clear explanations over correct answers with no justification. Organize your study plan by topic at AP CSP Unit 2.

Ready to review Unit 2?Start with the notes, check the topic cards, and use the practice or resource links when they are available for this course.