Question 1

What's the difference between lossless and lossy compression?

Accepted Answer

Lossless vs. lossy is about whether you can perfectly reconstruct the original data after compression. Lossless compression (Huffman, Lempel–Ziv–Welch, run-length) reduces bits by removing redundancy but guarantees complete reconstruction—good when exact data matters (text, code, some medical images). Lossy compression (JPEG, MP3, perceptual coding) reduces bits much more by throwing away information that’s less noticeable to humans; you get an approximation of the original and may see artifacts at high compression. Key CED points: lossless = guaranteed reconstruction (EK DAT-1.D.4); lossy = approximation, smaller files, more artifacts (EK DAT-1.D.5–D.6). Choose lossless when quality/accuracy is critical; choose lossy when minimizing size or transmission time matters (EK DAT-1.D.7–D.8). For more AP-aligned review, see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 2

How do I know when to use lossless vs lossy compression for the AP exam?

Accepted Answer

Use lossless when you must be able to reconstruct the original data exactly (text, source code, medical images, legal records). That aligns with EK DAT-1.D.4 and EK DAT-1.D.7—quality/reconstruction is maximally important. Use lossy when file size or transmission time matters more than a perfect copy (photos for web, streaming audio/video). EK DAT-1.D.5–D.6 explain lossy gives much greater size reduction but only an approximation and can introduce artifacts (e.g., JPEG, MP3). On the exam, pick the option that matches the tradeoff described: if the question emphasizes preserving exact information, choose lossless; if it emphasizes smallest size or faster transmission, choose lossy (EK DAT-1.D.8). Remember common examples (PNG, Huffman/LZW/RLE = lossless; JPEG, MP3, perceptual coding = lossy). Review Topic 2.2 on Fiveable for quick reminders (study guide: https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice questions (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 3

I'm confused about how compression can reduce file size but still keep all the information - how does that work?

Accepted Answer

Compression works because files usually contain redundancy—repeated or predictable patterns—so you can represent those patterns with fewer bits without losing the actual information. Lossless algorithms (like run-length encoding, Huffman, or Lempel–Ziv–Welch) replace repeated or probable patterns with shorter codes and store a dictionary or codebook so the exact original bits can be reconstructed later (CED EK DAT-1.D.4, EK DAT-1.D.3). Lossy algorithms (like JPEG or MP3) go further by removing information that people rarely notice (perceptual coding), so you get much smaller files but only an approximation of the original (CED EK DAT-1.D.5–D.6). Which to use depends on the goal: if perfect reconstruction matters, pick lossless; if minimizing size or bitrate matters and small quality loss is acceptable, pick lossy (CED EK DAT-1.D.7–D.8). For AP review, check the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 4

What's the syntax for calculating compression ratio in a programming problem?

Accepted Answer

Compression ratio is just a math formula you implement in code. Two common forms: - Ratio = original_size / compressed_size (e.g., 10 MB / 2 MB = 5 → “5:1”) - Percent reduction = (1 - compressed_size / original_size) * 100 (e.g., (1 - 2/10)*100 = 80% smaller) Make sure sizes use the same units (bits or bytes). If you need integers, cast to float to avoid integer division. Examples: - Python: ratio = original_size / compressed_size percent = (1 - compressed_size / original_size) * 100 - Java: double ratio = (double) originalSize / compressedSize; double percent = (1.0 - (double) compressedSize / originalSize) * 100.0; This aligns with LO DAT-1.D (compare algorithms using size reduction). For AP-style problems, report the ratio (like 5:1) or percent reduction and state units. For more review, see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 5

Why would anyone choose lossy compression if it loses data?

Accepted Answer

Lossy compression is chosen because the benefits (much smaller files and faster transmission) often outweigh the cost of imperfect reconstruction. Perceptual coding (used in JPEG and MP3) removes bits people don’t notice—tiny color shifts or sounds outside human hearing—so files can be “multiple times” smaller than lossless. That matters for streaming, web images, mobile storage, and real-time video where bandwidth or storage is limited. AP CED alignment: EK DAT-1.D.5–.6 explain lossy gives big size reductions but only approximate the original; EK DAT-1.D.7–.8 say use lossless when exact reconstruction matters and lossy when minimizing size/time matters. Tradeoffs include artifacts (visible blockiness or audio distortion) and lower bitrate; choose based on acceptable quality for the task. For more on comparing algorithms and examples, see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice questions (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 6

Can someone explain what redundancy means in data compression with a simple example?

Accepted Answer

Redundancy means “extra” or repeated bits in data that don’t add new information and can be removed or encoded more efficiently. Simple example: the text “AAAAABBBBCC” has lots of repeats. Run-length encoding (a lossless method) can compress it to “5A4B2C”—fewer bits but you can fully reconstruct the original. High redundancy → bigger compression ratio (more size reduction). Low redundancy (like random data) gives little or no compression because there’s less repeatable pattern; Shannon entropy measures that. For AP CSP, remember EK DAT-1.D.3: how much you can compress depends on redundancy and the algorithm used, and EK DAT-1.D.4/5: lossless vs. lossy tradeoffs. If you want more examples and practice problems for Topic 2.2, check the Fiveable study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and the unit page (https://library.fiveable.me/ap-computer-science-principles/unit-2).

Question 7

How do I write code to compare two compression algorithms?

Accepted Answer

Pick what you want to compare (e.g., Huffman vs. LZW or RLE vs. LZW) and follow a repeatable experiment plan. 1. Datasets: choose varied inputs—highly redundant text, low-redundancy text, simple images, photos, audio—so you test EK DAT-1.D.3. 2. Implement or use libraries for both algorithms (or pseudocode if required by the AP task). Make sure lossless algorithms can reconstruct exactly (EK DAT-1.D.4). 3. Measurements to record per file: original size (bits), compressed size (bits), compression ratio = original/compressed, runtime (ms) for compression/decompression, and for lossy methods measure fidelity (PSNR or visual/aural artifacts) per EK DAT-1.D.5–D.6. 4. Repeat runs and average times; test reconstruction equality for lossless. 5. Summarize tradeoffs: size vs. fidelity vs. time, and pick best algorithm per context (quality-critical → lossless; size-critical → lossy) per EK DAT-1.D.7–D.8. 6. For AP-style reporting, include algorithms used, datasets, metrics, sample code snippets, and test results (graphs/tables) to justify your conclusion. For step-by-step study and examples, see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles). Fiveable’s study guides and practice questions are useful for exam-style framing.

Question 8

What are some real examples of lossless compression algorithms I should know for the test?

Accepted Answer

Good question—for the AP exam you should know several common lossless algorithms and where they’re used: - Run-Length Encoding (RLE)—simple, good for repeated-symbol data (like simple bitmaps). - Huffman coding—variable-length prefix codes based on symbol frequency (important for entropy/redundancy ideas). - Arithmetic coding—like Huffman but can get closer to Shannon entropy limits. - Lempel-Ziv family (LZ77, LZ78) and LZW—dictionary-based, used in GIF, compress streaming data well. - DEFLATE—combines LZ77 + Huffman (used in ZIP and PNG), a practical example to cite. - Burrows–Wheeler Transform (BWT)—used inside bzip2 as a block transform before other lossless steps. On the exam, be ready to compare lossless vs. lossy (EK DAT-1.D.4–D.6) and pick lossless when exact reconstruction matters. For a quick review, check the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 9

I don't understand how fewer bits can still mean the same amount of information - isn't that contradictory?

Accepted Answer

It’s not contradictory—“fewer bits” can still carry the same information because compression removes redundancy or changes representation, not the actual meaning. Lossless algorithms (like Huffman, LZW, or run-length) find patterns and replace repeated or predictable parts with shorter codes so you can reconstruct the original exactly (EK DAT-1.D.4, EK DAT-1.D.2). Lossy compression (JPEG, MP3) removes information that’s perceptually unimportant, so files shrink more but you only get an approximation back (EK DAT-1.D.5–D.6). Think of texting: “see you later” → “cu l8r”—fewer characters, same message because common patterns and sounds are encoded more efficiently. The amount you can shrink data depends on redundancy and the algorithm (EK DAT-1.D.3). For AP exam focus, compare lossless vs. lossy tradeoffs (EK DAT-1.D.7–D.8) and know examples/terms like entropy, compression ratio, and bitrate. Review the Topic 2.2 study guide on Fiveable (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and try practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 10

When would you choose lossy over lossless compression in an AP exam question?

Accepted Answer

Choose lossy when minimizing file size or transmission time is the main goal and an approximate reconstruction is acceptable—typically for photos, audio, or video where perceptual coding removes details people won’t notice (think JPEG, MP3). CED cues: EK DAT-1.D.6 and EK DAT-1.D.8 say lossy usually yields much greater size reduction; EK DAT-1.D.5 warns you only get an approximation and may introduce artifacts. Choose lossless when you must perfectly reconstruct the original data (text, executable files, some scientific images) per EK DAT-1.D.4 and EK DAT-1.D.7. On the AP exam (LO DAT-1.D), justify your choice by comparing compression ratio, bitrate, and acceptable quality trade-offs (artifacts vs. storage/transmission gains). Use keywords like perceptual coding, redundancy, and artifacts in your explanation. For a quick refresher, check the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice questions (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 11

What's the difference between compression ratio and compression rate?

Accepted Answer

Compression ratio and compression rate are different measures: - Compression ratio describes how much smaller data becomes. It’s usually given as original_size : compressed_size (or sometimes as compressed_size/original_size). Example: a 10 MB file compressed to 2 MB has a ratio of 10:2 (often reported as 5:1)—you’ve reduced size by 5×. This links directly to redundancy and algorithm choice (lossless vs. lossy) in the CED (EK DAT-1.D.1–D.6). - Compression rate refers to speed or the data rate after/while compressing. It can mean (a) how fast an algorithm compresses/decompresses (e.g., MB/s) or (b) the resulting bitrate for streaming media (e.g., kbps). In AP terms, think of “bitrate” when discussing lossy formats like MP3 or JPEG (keywords in the CED). On the exam, compare algorithms using both: ratio for storage savings and rate for transmission/time trade-offs (LO DAT-1.D). For a quick review, see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and Unit 2 overview (https://library.fiveable.me/ap-computer-science-principles/unit-2).

Question 12

How do I trace through a simple compression algorithm step by step?

Accepted Answer

Pick a simple lossless algorithm like run-length encoding (RLE) and walk every step with the data in front of you. Example: input = AAABBBCC 1. Start at first char: A. Count repeats: 3 As → emit "3A". 2. Move to next new char: B. Count repeats: 3 Bs → emit "3B". 3. Next: C. Count repeats: 2 Cs → emit "2C". 4. End. Encoded output = 3A3B2C. When you trace, write the current symbol, running count, and output after each change. That shows how redundancy (long runs) makes RLE effective (fewer symbols → smaller size). Compare by computing compression ratio or bits before/after. For AP, practice tracing both lossless (RLE, Huffman, LZW) and lossy examples, and explain tradeoffs (EK DAT-1.D.3–DAT-1.D.6). For more examples and practice problems, check the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and thousands of practice questions (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 13

Why does my compression algorithm sometimes make files bigger instead of smaller?

Accepted Answer

Short answer: your compressor can make files bigger when the data has little redundancy (high entropy) or when the compression method adds more overhead than it removes. Why: lossless algorithms (Huffman, LZW, run-length encoding) exploit redundancy—repeated patterns or predictable symbols—to cut bits (EK DAT-1.D.3, DAT-1.D.4). If the input is already random or already compressed (JPEG, MP3, PNG), there’s no redundancy to remove, so added headers, dictionaries or codebooks make the file larger. Also small files can grow because fixed metadata and block/dictionary startup costs dominate. Lossy algorithms can shrink more, but they change data (EK DAT-1.D.5–D.7) and aren’t an option when exact reconstruction is required. For AP review, remember to compare algorithms by redundancy, overhead, and whether lossless vs lossy is acceptable (CED Topic 2.2). More practice: see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and extra problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Question 14

What are the trade-offs between lossless and lossy compression that I need to memorize?

Accepted Answer

Memorize these core trade-offs so you can pick the right algorithm on the exam (LO DAT-1.D): - Guarantee vs. size: Lossless = perfect reconstruction (EK DAT-1.D.4) but smaller reductions; lossy = only an approximation (EK DAT-1.D.5) and usually much higher compression (EK DAT-1.D.6). - Quality vs. bandwidth/storage: Lossless keeps full quality (choose when reconstruction is crucial: text, code, legal/medical files—EK DAT-1.D.7). Lossy sacrifices some quality for much smaller files and faster transmission (good for photos, audio, video—EK DAT-1.D.8). - Artifacts vs. fidelity: Lossy can introduce perceptual artifacts (JPEG, MP3) and may remove data you can’t get back; lossless (PNG, ZIP) preserves entropy and redundancy so you can fully restore original bits. - Choose based on redundancy and algorithm type (dictionary-based, RLE, Huffman, perceptual coding) and required compression ratio/bitrate (EK DAT-1.D.3). Review Fiveable’s Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice questions (https://library.fiveable.me/practice/ap-computer-science-principles) to drill examples.

Question 15

Can you give me examples of when quality vs file size matters for choosing compression types?

Accepted Answer

Choose lossless when you must reconstruct exact original data; choose lossy when smaller size or faster transmission matters and small errors are acceptable. Examples: - Photos for professional printing or medical images → lossless (PNG, TIFF, or raw): quality is critical, no artifacts allowed (EK DAT-1.D.4, EK DAT-1.D.7). - Web photos, social media, thumbnails → lossy (JPEG): big size reduction with perceptual coding; some artifacts are okay to speed page loads (EK DAT-1.D.5, EK DAT-1.D.8). - Music masters or archival audio → lossless (FLAC, WAV): exact audio needed. - Streaming music/podcasts → lossy (MP3, AAC): much smaller bitrate, acceptable approximation for listeners. - Text, code, spreadsheets → lossless (ZIP, LZW): any change breaks usability. On the AP exam you may be asked to compare algorithms by trade-offs (compression ratio, redundancy, artifacts). For more review see the Topic 2.2 study guide (https://library.fiveable.me/ap-computer-science-principles/unit-2/data-compression/study-guide/21yLa92Ec2potY7nGQfu) and practice problems (https://library.fiveable.me/practice/ap-computer-science-principles).

Term	Definition
bit	Shorthand for binary digit; the smallest unit of data in computing, represented as either 0 or 1.
data compression algorithms	Methods or procedures used to reduce the number of bits needed to represent data while maintaining or approximating the original information.
lossless data compression	A compression algorithm that reduces the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
lossy data compression	A compression algorithm that significantly reduces the number of bits stored or transmitted but only allows reconstruction of an approximation of the original data.
redundancy	Repetition or unnecessary duplication in data representation that can be reduced through compression.