upgrade
upgrade

👁️Computer Vision and Image Processing

Image Compression Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Image compression sits at the heart of nearly every computer vision application you'll encounter—from how your smartphone stores photos to how autonomous vehicles process visual data in real-time. You're being tested on your understanding of the fundamental trade-off between file size and image quality, and more importantly, why different techniques make different choices along that spectrum. The concepts here connect directly to signal processing, information theory, and human visual perception.

Don't just memorize which format uses which algorithm. Instead, focus on understanding the underlying mechanisms: transform-based compression, entropy coding, spatial redundancy exploitation, and the lossy vs. lossless paradigm. When you can explain why JPEG discards high-frequency data or how wavelet transforms outperform DCT for certain images, you're thinking like a computer vision engineer—and that's exactly what exam questions will demand.


Transform-Based Compression

These techniques convert image data from the spatial domain into a frequency domain, where redundant information becomes easier to identify and remove. The core insight is that most image energy concentrates in low-frequency components, while high frequencies (fine details) can often be reduced without noticeable quality loss.

Discrete Cosine Transform (DCT)

  • Converts spatial pixel data into frequency coefficients—the foundation of JPEG compression and most video codecs
  • Concentrates image energy in low-frequency components, making high-frequency data (fine details) candidates for removal
  • Enables quantization where coefficients are rounded or zeroed out, achieving compression at the cost of some fidelity

Wavelet Transform

  • Decomposes images at multiple scales simultaneously—capturing both coarse structure and fine detail in a single representation
  • Preserves sharp edges better than DCT because it uses localized basis functions rather than global cosines
  • Supports progressive transmission—images can be reconstructed at increasingly higher resolutions as more data arrives

Compare: DCT vs. Wavelet Transform—both convert to frequency domain, but DCT operates on fixed blocks while wavelets analyze multiple scales. For images with sharp edges (like text or diagrams), wavelets in JPEG 2000 outperform DCT-based JPEG. If asked about compression artifacts, DCT's blocking artifacts vs. wavelets' ringing artifacts is a key distinction.


Entropy Coding Methods

Entropy coding exploits statistical redundancy in data—the fact that some values appear more frequently than others. These techniques assign shorter codes to common patterns and longer codes to rare ones, achieving compression without any information loss.

Huffman Coding

  • Assigns variable-length binary codes based on symbol frequency—common symbols get shorter codes, rare symbols get longer ones
  • Guarantees optimal prefix-free codes when symbol probabilities are known, meaning no code is a prefix of another
  • Serves as the final compression stage in JPEG, PNG, and ZIP—it's rarely used alone but appears everywhere as a building block

Run-Length Encoding (RLE)

  • Replaces consecutive identical values with a single value-count pair—transforms "AAAAAABB" into "6A2B"
  • Highly effective for binary images and simple graphics where large uniform regions exist
  • Performs poorly on photographs or complex textures where adjacent pixels rarely repeat exactly

Compare: Huffman Coding vs. RLE—both are lossless, but they exploit different redundancies. RLE targets spatial redundancy (repeated adjacent values), while Huffman targets statistical redundancy (frequent symbols). Many practical systems use both: RLE first to reduce repeated values, then Huffman to encode the result efficiently.


Complete Compression Standards

These are full file formats or codec families that combine multiple techniques—transforms, quantization, and entropy coding—into practical, standardized systems.

JPEG (Joint Photographic Experts Group)

  • Combines DCT, quantization, and Huffman coding into the most widely-used lossy image format for photographs
  • Allows adjustable quality settings—higher compression means more aggressive quantization and smaller files at the cost of visible artifacts
  • Optimized for continuous-tone images like photos; performs poorly on sharp edges, text, or graphics with hard boundaries

PNG (Portable Network Graphics)

  • Lossless format using filtering plus Deflate compression—preserves every pixel exactly while still reducing file size
  • Supports alpha channel transparency—essential for web graphics, logos, and compositing applications
  • Outperforms JPEG for graphics with sharp edges and uniform colors, but produces larger files for photographs

JPEG 2000

  • Uses wavelet transform instead of DCT—achieving better quality at equivalent compression ratios, especially for high-resolution images
  • Supports both lossy and lossless modes in a single standard, plus region-of-interest coding
  • Enables progressive decoding—viewers see a low-resolution preview that sharpens as more data loads

Compare: JPEG vs. PNG—JPEG is lossy and excels at photographs; PNG is lossless and excels at graphics with transparency. The classic exam question: which format for a logo overlay on a photo? PNG for the logo (sharp edges, transparency), JPEG for the photo background (continuous tones, smaller file).


Advanced Compression Approaches

These techniques go beyond standard transforms to exploit more complex patterns in image data, often achieving impressive compression ratios at the cost of computational complexity.

Vector Quantization

  • Groups similar pixel blocks into clusters and represents each with a codebook index—like a "dictionary" of common image patterns
  • Trades encoding time for decoding speed—building the codebook is slow, but reconstruction is fast
  • Achieves good compression for textures and patterns where similar structures repeat throughout the image

Fractal Compression

  • Exploits self-similarity within images—encoding how parts of an image resemble transformed versions of other parts
  • Enables resolution-independent storage—images can theoretically be reconstructed at any zoom level
  • Computationally expensive to encode but fast to decode, making it impractical for real-time capture but interesting for archival

Compare: Vector Quantization vs. Fractal Compression—both identify patterns within images, but VQ uses a fixed codebook of representative blocks while fractal methods find self-similar transformations. VQ is practical and widely used; fractal compression remains largely academic despite its elegant mathematics.


Video Compression Extensions

Video compression builds on image compression but adds temporal redundancy exploitation—the fact that consecutive frames are usually very similar.

MPEG (Moving Picture Experts Group)

  • Extends DCT-based compression with motion compensation—encoding only the differences between frames rather than each frame independently
  • Defines I-frames, P-frames, and B-frames—keyframes, predicted frames, and bidirectionally predicted frames for efficient temporal compression
  • Powers streaming video, broadcasting, and storage—MPEG-4/H.264/H.265 are the backbone of modern video delivery

The Fundamental Trade-Off: Lossy vs. Lossless

Understanding when to use each approach is as important as understanding how they work. The choice depends entirely on your application's requirements for quality, file size, and reconstruction fidelity.

Lossless Compression

  • Guarantees perfect reconstruction—every original pixel value can be recovered exactly from the compressed file
  • Achieves modest compression ratios (typically 2:1 to 3:1 for photographs) because no information is discarded
  • Essential for medical imaging, archival, and editing workflows where artifacts would compound with each save

Lossy Compression

  • Achieves dramatic compression ratios (10:1 to 50:1 or higher) by permanently discarding less-perceptible information
  • Exploits human visual system limitations—we're less sensitive to high-frequency details and certain color variations
  • Appropriate for final delivery (web, streaming, display) where files won't be re-edited

Compare: Lossless vs. Lossy—this is the central conceptual divide in compression. Lossless (PNG, lossless JPEG 2000) preserves everything but achieves limited compression; lossy (JPEG, standard JPEG 2000) achieves dramatic compression by exploiting perceptual limitations. The exam loves asking: when would you choose each?


Quick Reference Table

ConceptBest Examples
Transform-based compressionDCT, Wavelet Transform
Entropy codingHuffman Coding, RLE
Lossy image formatsJPEG, lossy JPEG 2000
Lossless image formatsPNG, lossless JPEG 2000
Exploits spatial redundancyRLE, Vector Quantization
Exploits statistical redundancyHuffman Coding
Exploits self-similarityFractal Compression
Video compressionMPEG (H.264, H.265)

Self-Check Questions

  1. Both DCT and Wavelet Transform convert images to the frequency domain. What specific advantage does wavelet-based JPEG 2000 have over DCT-based JPEG for images containing text or sharp edges?

  2. You need to compress a medical X-ray for archival storage where diagnostic accuracy is critical. Which compression approach (lossy or lossless) would you choose, and name two specific formats that support it.

  3. Compare and contrast how Huffman Coding and Run-Length Encoding achieve compression. What type of redundancy does each exploit, and in what scenario would RLE dramatically outperform Huffman?

  4. A web developer asks whether to use JPEG or PNG for a company logo that needs to appear over various background colors. Which format should they choose and why? What if they were compressing a photograph instead?

  5. Explain how MPEG video compression builds upon still-image compression techniques like JPEG. What additional type of redundancy does video compression exploit that isn't present in single images?