AP Computer Science Principles Unit 2 ReviewData in AP Computer Science Principles

Verified for the 2027 examCompiled by AP educators~17–22% of the exam
Pep mascot
Upgrade your Fiveable account to print any study guide

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Click below to go to billing portal → update your plan → choose Yearly→ and select "Fiveable Share Plan". Only pay the difference

Plan is open to all students, teachers, parents, etc
Pep mascot
Upgrade your Fiveable account to export vocabulary

Download study guides as beautiful PDFs See example

Print or share PDFs with your students

Always prints our latest, updated content

Mark up and annotate as you study

Plan is open to all students, teachers, parents, etc

AP Computer Science Principles Unit 2, Data, covers 4 topics on how computers store and process information, with binary numbers as the core idea behind every value a computer handles. All data, from audio files to video, gets broken down into bits and stored in binary. AP CSP then builds on that to cover data compression, pulling insights from large datasets, and writing programs that actually work with real data.

unit 2 review

AP Computer Science Principles Unit 2, Data, explains how every piece of information a computer handles, from a text message to a 4K video, is ultimately stored as bits (0s and 1s) and how programs turn huge piles of those bits into useful knowledge. The single biggest idea is that binary representation plus abstraction lets simple on-or-off values build up into anything, and that processing data at scale is how we find patterns and answer real questions.

What this unit covers

Binary: how computers represent everything

  • A bit is a binary digit, either 0 or 1. It is the lowest-level component of any value a computer stores. A byte is 8 bits.
  • Binary works exactly like decimal, just with base 2 instead of base 10. Each position has a place value that is a power of 2 (1, 2, 4, 8, 16, 32...), and a number's value is each bit multiplied by its place value, added up. So 1011 in binary is 8 + 0 + 2 + 1 = 11 in decimal.
  • You need to convert both directions. Decimal to binary means finding which powers of 2 add up to your number. Binary to decimal means adding up the place values where there is a 1. You also need to compare and order binary numbers, which works the same way as decimal once you read the place values.
  • Abstraction is the process of reducing complexity by focusing on the main idea and hiding details. Binary is the perfect example. You never think about individual bits when you watch a video, because layers of abstraction (bits to numbers to colors to pixels to frames) hide them.

When bits run out: overflow and rounding

  • In many programming languages, integers get a fixed number of bits. That limits the range of values you can store. Go past the limit and you get overflow, an error where the result is too large to represent.
  • The language on the AP exam reference sheet abstracts this away. Its integers are limited only by the computer's memory, so overflow is not an issue there, but you still need to explain why it happens in fixed-bit systems.
  • Real numbers like fractions are approximated, similar to scientific notation. That approximation causes round-off (rounding) errors. This is why computers sometimes give you 0.30000000000000004 instead of 0.3.

Compression: trading bits for fidelity

  • Compression reduces the number of bits needed to store or transmit data. Fewer bits does not necessarily mean less information, because clever encoding can squeeze out redundancy.
  • How much a file shrinks depends on two things, the amount of redundancy in the original data and the compression algorithm applied.
  • Lossless compression reduces bits while guaranteeing complete reconstruction of the original data. Nothing is permanently thrown away.
  • Lossy compression usually shrinks data much more, but the original can only be approximately reconstructed. Some data is gone forever.
  • The exam loves the trade-off question. Pick lossless when exact reconstruction matters (a legal document, source code). Pick lossy when smaller size or faster transmission matters more than perfect quality (streaming a video, a photo on a website).

Turning data into information

  • Information is the collection of facts and patterns extracted from data. Data by itself is just raw values. Processing it reveals trends, connections, and answers to problems.
  • Correlation is not causation. Digitally processed data may show two variables moving together, but that alone does not prove one causes the other. Additional research is needed to figure out the real relationship.
  • A single source often is not enough. Combining multiple data sources, then clustering and classifying the data, is how programs generate new insight.
  • Metadata is data about data. For an image, the metadata might be the creation date or file size. Changing or deleting metadata does not change the primary data, and metadata makes data easier to find, organize, and manage.
  • Real datasets are messy regardless of size. You have to clean data, deal with incomplete or invalid entries, and combine sources. Open text fields are a classic problem, since different users abbreviate, spell, and capitalize things differently ("NY," "N.Y.," "new york").
  • Scale matters too. The ability to process data depends on the capabilities of the users and their tools. Datasets too large for one machine may need parallel systems to process.

Programs as data tools

  • Programs process data to acquire information, and they do it iteratively and interactively. You filter, look at the result, adjust, and repeat.
  • Search tools find information efficiently. Filtering systems narrow data down and surface patterns. Spreadsheets organize data and reveal trends.
  • Insight comes from translating and transforming data, and from communicating it visually with tables, diagrams, charts, and text. A good visualization is itself an act of extracting information.

Unit 2, Data in AP Computer Science Principles at a glance

TopicCore ideaMust-know factsClassic exam task
2.1 Binary NumbersAll data is bits; place values are powers of 2Bit = 0 or 1, byte = 8 bits, abstraction hides low-level detailConvert between binary and decimal; order binary numbers
2.1 (consequences)Fixed bits create limitsOverflow with fixed-size integers; round-off errors with real numbersExplain why a calculation gives a wrong or approximate result
2.2 Data CompressionFewer bits, same (or close enough) informationLossless reconstructs perfectly; lossy shrinks more but loses data; redundancy drives savingsChoose lossless vs lossy for a given scenario and justify it
2.3 Extracting InformationData becomes information through analysisCorrelation is not causation; metadata describes data; cleaning handles messy or invalid entriesIdentify what a dataset can and cannot tell you
2.4 Using Programs with DataPrograms scale analysis humans cannotFiltering, searching, clustering, classifying, combining sources; iterative process; visualizations communicate insightDecide which filter or program step extracts a given insight

Why Unit 2, Data in AP Computer Science Principles matters in AP CSP

Data is one of the five Big Ideas of AP CSP, and it is the layer everything else sits on. Programs (Big Idea 3) exist to process data, the Internet (Big Idea 4) exists to move data, and the social impacts in Big Idea 5 mostly come from collecting and analyzing data about people.

  • Abstraction, the most important concept in the whole course, gets its clearest demonstration here. Bits become numbers, numbers become colors, colors become images, and each layer hides the one below it.
  • The skill of evaluating claims from data (correlation vs causation, biased or incomplete datasets) is the foundation for analyzing computing innovations, which the exam asks about constantly.
  • Compression trade-offs train you in a habit AP CSP rewards everywhere, which is choosing between two valid options based on context rather than hunting for one "right" answer.

How this unit connects across the course

  • Collaboration and program design from Creative Development (Unit 1) come back here, since the iterative, interactive way you process data mirrors the iterative development process you learned there.
  • Algorithms and Programming (Unit 3) is where you actually write the code that filters, searches, and transforms data. Lists in Unit 3 are the data structures that hold the datasets Unit 2 talks about, and binary search there depends on the ordering skills you build here.
  • Computer Systems and Networks (Unit 4) sends data across the Internet, and compression from this unit explains why transmitted files get shrunk first. Bits and bytes are also the units that bandwidth is measured in.
  • Impact of Computing (Unit 5) takes the data analysis ideas here and asks the hard questions, like what happens when collected data invades privacy or when biased datasets produce biased conclusions.

Key syntax and algorithms

  • Binary to decimal conversion: multiply each bit by its place value (a power of 2) and add. The bit positions, right to left, are worth 2^0, 2^1, 2^2, and so on. So 1101 = 8 + 4 + 0 + 1 = 13.
  • Decimal to binary conversion: subtract the largest power of 2 that fits, mark a 1 in that position, and repeat with the remainder. For 13, take 8 (1), then 4 (1), skip 2 (0), take 1 (1), giving 1101.
  • Comparing binary numbers: with equal lengths, compare bit by bit from the left, just like comparing decimal digits. A longer binary number (with a leading 1) is bigger.
  • Overflow reasoning: with n bits for a non-negative integer, the largest value is 2^n - 1. Exceed it and the result cannot be represented.
  • Lossless vs lossy decision rule: if the original must be perfectly reconstructable, use lossless. If minimizing size or transmission time matters more, lossy is usually the better choice.
  • Filtering and cleaning: select only the rows or values matching a condition, standardize inconsistent entries, and remove invalid or incomplete records before analyzing.
  • Combining, clustering, classifying: merge multiple data sources, group similar records, and sort records into categories. These are the program-level steps that turn raw data into knowledge.

Unit 2, Data in AP Computer Science Principles on the AP exam

The AP CSP end-of-course exam is entirely multiple choice, and Data content shows up in a few predictable shapes.

  • Straight binary math. You convert a decimal number to binary or back, or pick the largest value from a set of binary numbers. These are quick points if you have the powers of 2 down cold.
  • Consequence questions. A scenario describes a program producing an unexpectedly wrong number, and you identify overflow or a rounding error as the cause.
  • Compression scenarios. You read a context (archiving medical records, streaming music) and choose whether lossless or lossy compression is appropriate, or you reason about why one file compresses more than another based on redundancy.
  • Data analysis stimulus questions. You get a description of a dataset, a table, or a visualization and decide what conclusion the data actually supports, which filter would answer a question, what metadata would help organize the files, or why a correlation does not prove causation.
  • Some Data questions are multi-select, where exactly two answers are correct, especially for "which conclusions can be drawn" style prompts. Read carefully and pick both.

This material also feeds the Create performance task indirectly, since the program you build there manages data in lists and your written responses explain how it does so.

Essential questions

  • How can just two symbols, 0 and 1, represent every kind of data a computer handles?
  • What do we give up, and what do we gain, when we compress data?
  • When does a pile of raw data become actual knowledge, and what can go wrong along the way?
  • Why do we need programs, rather than people, to analyze large datasets?

Key terms to know

  • Bit: a binary digit, either 0 or 1, the lowest-level component of any value a computer stores.
  • Byte: a group of 8 bits.
  • Binary (base 2): a number system using only 0 and 1, where each position's place value is a power of 2.
  • Abstraction: reducing complexity by focusing on the main idea and hiding lower-level details.
  • Overflow: an error that occurs when a value is too large to be represented with the fixed number of bits available.
  • Round-off (rounding) error: the small inaccuracy that results when real numbers are approximated by a limited number of bits.
  • Lossless compression: compression that reduces bits while guaranteeing the original data can be completely reconstructed.
  • Lossy compression: compression that shrinks data more aggressively but only allows approximate reconstruction of the original.
  • Information: the collection of facts and patterns extracted from data.
  • Metadata: data about data, such as an image's creation date or file size; changing it does not change the primary data.
  • Correlation: a pattern where two variables change together, which does not by itself prove one causes the other.
  • Data cleaning: fixing or removing incomplete, invalid, or inconsistent entries so a dataset can be analyzed reliably.
  • Filtering: selecting only the data that meets a condition, a core tool for finding patterns.
  • Classifying and clustering: sorting data into categories and grouping similar records, key steps in gaining insight from combined data sources.

Common mix-ups

  • Fewer bits does not mean less information. A compressed file can carry the exact same information as the original (that is the whole point of lossless compression).
  • Correlation is not causation. If ice cream sales and sunburns rise together, the data shows a relationship, not that one causes the other. The exam expects you to say "additional research is needed."
  • Lossy is not "bad" and lossless is not "always better." The right choice depends on context. Lossy is often the smarter pick when file size or transmission speed matters most.
  • Editing metadata does not edit the data. Renaming a photo's date tag changes information about the image, not a single pixel in it.
  • Overflow and rounding errors are different problems. Overflow comes from integers exceeding a fixed bit limit. Rounding errors come from approximating real numbers. Do not use them interchangeably.

Frequently Asked Questions

What topics are covered in AP CSP Unit 2?

AP CSP Unit 2: Data covers 4 topics: Binary Numbers (2.1), Data Compression (2.2), Extracting Information from Data (2.3), and Using Programs with Data (2.4). You'll learn how computers store information in bits, how compression reduces file sizes, and how to analyze datasets with programs to draw real conclusions. See the full breakdown at AP CSP Unit 2.

What's on the AP CSP Unit 2 progress check (MCQ and FRQ)?

The AP CSP Unit 2 progress check includes MCQ and FRQ parts that draw directly from all 4 unit topics: Binary Numbers, Data Compression, Extracting Information from Data, and Using Programs with Data. MCQ questions test your ability to interpret binary values and compression trade-offs. FRQ prompts typically ask you to analyze a dataset or explain how a program processes data to find patterns. Practice with questions matched to these topics at AP CSP Unit 2.

How do I practice AP CSP Unit 2 FRQs?

AP CSP Unit 2 FRQs most often come from Extracting Information from Data (2.3) and Using Programs with Data (2.4), asking you to interpret a dataset, identify a trend, or explain what a program does with data. To practice, work through prompts that give you a table or chart and ask you to draw a conclusion or describe a computational solution. Focus on writing clear, specific explanations, not vague ones. Find practice FRQ prompts for this unit at AP CSP Unit 2.

Where can I find AP CSP Unit 2 practice questions?

For AP CSP Unit 2 practice questions, including multiple-choice and practice test sets, the best starting point is AP CSP Unit 2. You'll find MCQ questions covering Binary Numbers, Data Compression, Extracting Information from Data, and Using Programs with Data, organized by topic so you can target the areas where you need the most work.

How should I study AP CSP Unit 2?

Start AP CSP Unit 2 by building a solid grip on Binary Numbers (2.1), since everything else in the unit depends on understanding how bits represent data. Then work through Data Compression (2.2) by comparing lossless vs. lossy examples you already know, like ZIP files vs. JPEGs. For topics 2.3 and 2.4, practice reading data tables and tracing through simple programs that process datasets. Write out your reasoning in full sentences, because FRQ graders reward clear explanations over correct answers with no justification. Organize your study plan by topic at AP CSP Unit 2.