upgrade
upgrade

ℹ️Information Theory

Data Compression Methods

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Data compression sits at the heart of Information Theory—it's the practical application of Shannon's foundational insights about entropy and redundancy. When you study compression methods, you're really studying how to exploit statistical patterns in data to represent information more efficiently. Every algorithm here connects back to core concepts you'll be tested on: entropy limits, redundancy reduction, source coding theorems, and the fundamental trade-off between compression ratio and computational complexity.

Don't just memorize which algorithm does what—understand why each approach works and when it's optimal. The exam will ask you to compare methods, identify which technique suits a given data type, and explain the theoretical principles that make compression possible. Know the mechanism behind each method, and you'll be ready for any FRQ that asks you to analyze or design a compression strategy.


Entropy-Based Methods

These algorithms directly exploit Shannon's entropy—they assign shorter codes to more probable symbols, approaching the theoretical compression limit. The key insight: if you know the probability distribution of your source, you can encode common symbols with fewer bits.

Huffman Coding

  • Builds a binary tree from symbol frequencies—more frequent symbols get shorter codes, directly minimizing expected code length
  • Optimal prefix-free code for symbol-by-symbol encoding, meaning no codeword is a prefix of another (guarantees unique decodability)
  • Widely implemented in JPEG, PNG, and ZIP formats; the go-to example when asked about practical entropy coding

Arithmetic Coding

  • Encodes entire messages as a single number in the interval [0,1)[0, 1), representing cumulative probabilities
  • Achieves fractional bit lengths—can get closer to the entropy limit than Huffman when symbol probabilities aren't powers of 12\frac{1}{2}
  • Supports adaptive coding where the model updates dynamically, making it powerful for sources with changing statistics

Compare: Huffman vs. Arithmetic Coding—both are entropy coders targeting the Shannon limit, but Huffman assigns whole-bit codewords while arithmetic coding handles fractional bits. For FRQs on compression efficiency, arithmetic coding is your example of getting closer to H(X)H(X) bits per symbol.

Prediction by Partial Matching (PPM)

  • Uses context modeling to predict the next symbol based on preceding symbols, building a statistical model of the source
  • Achieves excellent compression ratios on text by capturing higher-order dependencies (the more context, the better the prediction)
  • Memory-intensive but demonstrates how sophisticated modeling can dramatically reduce entropy estimates

Dictionary-Based Methods

Instead of modeling symbol probabilities, these algorithms find and replace repeated sequences with shorter references. The mechanism: build a dictionary of patterns as you scan the data, then substitute future occurrences with dictionary pointers.

Lempel-Ziv-Welch (LZW) Compression

  • Builds the dictionary dynamically during encoding—no need to transmit the dictionary separately
  • Replaces repeated sequences with fixed-width codes, achieving compression without knowing source statistics in advance
  • Standard in GIF and TIFF formats; a classic example of universal compression that adapts to any input

Dictionary-Based Compression (General)

  • Static dictionaries use pre-built lookup tables; dynamic dictionaries grow during compression (trade-off: adaptability vs. overhead)
  • Effective for repetitive data where patterns recur frequently, like markup languages or structured text
  • Foundation of ZIP and gzip—understanding LZ77/LZ78 variants is essential for explaining modern compression pipelines

Compare: Entropy-based vs. Dictionary-based methods—Huffman/arithmetic exploit symbol frequency, while LZW exploits sequence repetition. If an FRQ gives you data with long repeated substrings, dictionary methods win; for data with skewed symbol distributions, entropy coding is your answer.


Transform-Based Methods

These techniques don't compress directly—they transform data into a representation where compression becomes more effective. The principle: convert data to a domain (frequency, wavelet) where energy concentrates in fewer coefficients.

Discrete Cosine Transform (DCT)

  • Converts spatial data to frequency components—most image energy concentrates in low-frequency coefficients
  • Enables lossy compression by quantizing or discarding high-frequency components that contribute less to perceived quality
  • Core of JPEG and video codecs (H.264, MPEG); the standard example when discussing transform coding

Wavelet Compression

  • Represents data at multiple resolutions using wavelet transforms, capturing both frequency and spatial information
  • Supports lossy and lossless modes—JPEG 2000 uses wavelets for superior quality at low bit rates
  • Better edge preservation than DCT because wavelets are localized in both time and frequency domains

Compare: DCT vs. Wavelet transforms—both convert to frequency domain, but DCT uses fixed block sizes while wavelets provide multi-resolution analysis. Wavelets handle edges better (fewer blocking artifacts), which is why JPEG 2000 outperforms classic JPEG at high compression ratios.

Burrows-Wheeler Transform (BWT)

  • Rearranges input into runs of similar characters through a reversible block-sorting algorithm
  • Not compression itself—it's a preprocessing step that makes subsequent compression (Huffman, RLE) far more effective
  • Used in bzip2; demonstrates how transforms can expose redundancy that direct methods would miss

Redundancy-Exploiting Methods

These algorithms target specific types of redundancy in data—consecutive repetitions or predictable differences. They work best when data has obvious local patterns.

Run-Length Encoding (RLE)

  • Replaces consecutive identical symbols with a single symbol plus a count (e.g., "AAAA" → "A4")
  • Highly effective for sparse data—fax machines, simple graphics, and binary images with long runs
  • Simple but limited—performs poorly on data with high variability; often used after transforms like BWT

Delta Encoding

  • Stores differences between consecutive values rather than absolute values, exploiting temporal correlation
  • Ideal for time-series and video—successive frames or samples are often similar, so deltas are small
  • Combines well with other methods—small delta values compress efficiently with entropy or RLE coding

Compare: RLE vs. Delta Encoding—RLE exploits spatial repetition (identical consecutive values), while delta encoding exploits temporal correlation (small changes between values). For video compression questions, delta encoding between frames is your key example; for binary image compression, think RLE.


Quick Reference Table

ConceptBest Examples
Entropy coding (Shannon limit)Huffman, Arithmetic coding, PPM
Dictionary-based (sequence repetition)LZW, ZIP/gzip variants
Transform coding (frequency domain)DCT, Wavelet compression
Preprocessing transformsBurrows-Wheeler Transform
Local redundancy exploitationRLE, Delta encoding
Lossy compressionDCT (JPEG), Wavelet (JPEG 2000)
Lossless compressionHuffman, LZW, Arithmetic, BWT + RLE
Adaptive/context modelingArithmetic coding, PPM

Self-Check Questions

  1. Which two methods both approach the Shannon entropy limit but differ in how they assign code lengths? Explain why one can outperform the other for certain probability distributions.

  2. Compare and contrast dictionary-based compression (LZW) with entropy-based compression (Huffman). Under what data conditions would you choose one over the other?

  3. If you're compressing video with high frame-to-frame similarity, which combination of methods would you use, and why does each component help?

  4. The Burrows-Wheeler Transform doesn't compress data itself. Explain the principle behind why it improves subsequent compression, and name an algorithm that typically follows it.

  5. FRQ-style: You're designing a compression system for medical images where some quality loss is acceptable but edge detail must be preserved. Which transform method would you recommend over DCT, and what theoretical property makes it superior for this application?