๐Ÿ‘๏ธComputer Vision and Image Processing

Image Compression Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Image compression sits at the heart of nearly every computer vision application you'll encounter, from how your smartphone stores photos to how autonomous vehicles process visual data in real-time. The fundamental trade-off here is between file size and image quality, and more importantly, why different techniques make different choices along that spectrum. These concepts connect directly to signal processing, information theory, and human visual perception.

Don't just memorize which format uses which algorithm. Focus on understanding the underlying mechanisms: transform-based compression, entropy coding, spatial redundancy exploitation, and the lossy vs. lossless paradigm. When you can explain why JPEG discards high-frequency data or how wavelet transforms outperform DCT for certain images, you're thinking like a computer vision engineer, and that's exactly what exam questions will demand.


Transform-Based Compression

These techniques convert image data from the spatial domain into a frequency domain, where redundant information becomes easier to identify and remove. The core insight is that most image energy concentrates in low-frequency components, while high frequencies (fine details) can often be reduced without noticeable quality loss.

Discrete Cosine Transform (DCT)

  • Converts spatial pixel data into frequency coefficients. In JPEG, the image is divided into 8ร—88 \times 8 pixel blocks, and each block is independently transformed. The resulting coefficients represent how much of each frequency is present in that block.
  • Concentrates image energy in low-frequency components. After the transform, most of the visually important information ends up in just a few coefficients (typically the upper-left corner of the 8ร—88 \times 8 coefficient matrix), making the rest candidates for removal.
  • Enables quantization, where coefficients are divided by values from a quantization table and then rounded. Higher-frequency coefficients get divided by larger numbers, pushing them toward zero. This is the step where JPEG actually loses information.

Wavelet Transform

  • Decomposes images at multiple scales simultaneously, capturing both coarse structure and fine detail in a single representation. Rather than chopping the image into fixed blocks, wavelets apply successive rounds of filtering across the entire image.
  • Preserves sharp edges better than DCT because wavelet basis functions are localized in both space and frequency. DCT uses global cosine functions across each block, which can't represent abrupt transitions as cleanly.
  • Supports progressive transmission. The multi-resolution structure means you can send the coarsest approximation first, then refine it with each additional layer of detail. Viewers see a low-resolution preview that sharpens as more data arrives.

Compare: DCT vs. Wavelet Transform: both convert to the frequency domain, but DCT operates on fixed 8ร—88 \times 8 blocks while wavelets analyze the image at multiple scales without hard block boundaries. For images with sharp edges (text, diagrams, line art), wavelets in JPEG 2000 outperform DCT-based JPEG. When it comes to artifacts, DCT produces blocking artifacts (visible grid lines at block boundaries), while wavelets produce ringing artifacts (faint oscillations near sharp edges). Know this distinction for exams.


Entropy Coding Methods

Entropy coding exploits statistical redundancy in data, the fact that some values appear more frequently than others. These techniques assign shorter codes to common patterns and longer codes to rare ones, achieving compression without any information loss.

Huffman Coding

  • Assigns variable-length binary codes based on symbol frequency. Common symbols get shorter codes, rare symbols get longer ones. For example, if the value 0 appears in 40% of quantized DCT coefficients, it might get a 2-bit code, while a rare value might need 12 bits.
  • Guarantees optimal prefix-free codes when symbol probabilities are known. "Prefix-free" means no code is a prefix of another, so the decoder can always tell where one symbol ends and the next begins without ambiguity.
  • Serves as the final compression stage in JPEG, PNG, and ZIP. It's rarely used alone but appears everywhere as a building block that squeezes out remaining statistical redundancy after other techniques have done their work.

Run-Length Encoding (RLE)

  • Replaces consecutive identical values with a single value-count pair. For example, "AAAAAABB" becomes "6A2B". In image terms, a row of 50 white pixels becomes one entry instead of 50.
  • Highly effective for binary images and simple graphics where large uniform regions exist. Think of a scanned document: huge stretches of white background compress down to almost nothing.
  • Performs poorly on photographs or complex textures where adjacent pixels rarely repeat exactly. If every pixel differs from its neighbor, RLE can actually increase file size because you're storing both the value and a count of 1.

Compare: Huffman Coding vs. RLE: both are lossless, but they exploit different redundancies. RLE targets spatial redundancy (repeated adjacent values), while Huffman targets statistical redundancy (how often each symbol appears overall). Many practical systems use both in sequence: RLE first to collapse repeated values, then Huffman to encode the result efficiently.


Complete Compression Standards

These are full file formats or codec families that combine multiple techniques (transforms, quantization, and entropy coding) into practical, standardized systems.

JPEG (Joint Photographic Experts Group)

JPEG's compression pipeline has a specific order worth knowing:

  1. Convert the image from RGB to YCbCr color space (separating luminance from chrominance)
  2. Downsample the chrominance channels (typically 4:2:0), exploiting the fact that human vision is less sensitive to color detail than brightness detail
  3. Divide the image into 8ร—88 \times 8 pixel blocks
  4. Apply DCT to each block
  5. Quantize the DCT coefficients using a quality-dependent quantization table
  6. Encode the quantized coefficients using zigzag ordering, RLE (for zero runs), and Huffman coding
  • Allows adjustable quality settings. Higher compression means more aggressive quantization and smaller files at the cost of visible artifacts (blocking, color banding, mosquito noise near edges).
  • Optimized for continuous-tone images like photos. It performs poorly on sharp edges, text, or graphics with hard boundaries because the 8ร—88 \times 8 block structure and quantization smear those transitions.

PNG (Portable Network Graphics)

  • Lossless format using prediction filtering plus Deflate compression. Each row of pixels is filtered (using one of five prediction methods that estimate each pixel from its neighbors), then the filtered data is compressed with Deflate (a combination of LZ77 and Huffman coding). Every pixel is preserved exactly.
  • Supports alpha channel transparency, which is essential for web graphics, logos, and compositing applications where you need to overlay images on different backgrounds.
  • Outperforms JPEG for graphics with sharp edges and uniform colors, but produces larger files for photographs because lossless compression can't discard perceptual redundancy.

JPEG 2000

  • Uses wavelet transform instead of DCT, achieving better quality at equivalent compression ratios, especially for high-resolution images. It avoids the blocking artifacts that plague standard JPEG.
  • Supports both lossy and lossless modes in a single standard, plus region-of-interest (ROI) coding, where you can allocate more bits to important areas of the image.
  • Enables progressive decoding. The multi-resolution wavelet structure means viewers see a coarse preview that sharpens as more data loads. Despite its technical advantages, JPEG 2000 never replaced JPEG for general use due to higher computational cost and limited browser support, though it's widely used in medical imaging and digital cinema.

Compare: JPEG vs. PNG: JPEG is lossy and excels at photographs; PNG is lossless and excels at graphics with transparency. Classic exam question: which format for a logo overlay on a photo? PNG for the logo (sharp edges, transparency needed), JPEG for the photo background (continuous tones, smaller file).


Advanced Compression Approaches

These techniques go beyond standard transforms to exploit more complex patterns in image data, often achieving strong compression ratios at the cost of computational complexity.

Vector Quantization

  • Groups similar pixel blocks into clusters and represents each with a codebook index. Think of it as building a dictionary of common image patches during encoding, then replacing each block in the image with the index of its closest match.
  • Trades encoding time for decoding speed. Building the codebook (often using k-means clustering or the LBG algorithm) is computationally expensive, but reconstruction just requires looking up indices in the codebook, which is fast.
  • Achieves good compression for textures and patterns where similar structures repeat throughout the image. The codebook size directly controls the trade-off: fewer entries means higher compression but lower fidelity.

Fractal Compression

  • Exploits self-similarity within images, encoding how parts of an image resemble transformed versions of other parts. The encoder searches for larger "domain blocks" that, after geometric and brightness transformations, approximate smaller "range blocks."
  • Enables resolution-independent decoding. Because the encoding stores transformations rather than pixel values, images can theoretically be reconstructed at any resolution by iterating those transformations. In practice, upscaled results are smoother than true high-resolution data.
  • Computationally expensive to encode (the search for matching domain blocks is the bottleneck) but fast to decode, making it impractical for real-time capture. It remains largely a research topic rather than a production tool.

Compare: Vector Quantization vs. Fractal Compression: both identify patterns within images, but VQ uses a fixed codebook of representative blocks while fractal methods find self-similar transformations across scales. VQ is practical and has seen real-world use (e.g., early video game textures); fractal compression remains largely academic despite its elegant mathematics.


Video Compression Extensions

Video compression builds on image compression but adds temporal redundancy exploitation: the fact that consecutive frames are usually very similar.

MPEG (Moving Picture Experts Group)

  • Extends DCT-based compression with motion compensation. Rather than compressing each frame independently, the encoder estimates how blocks of pixels have moved between frames and encodes only the difference (the "residual"). This residual is then DCT-transformed and quantized just like in JPEG.
  • Defines three frame types that work together:
    • I-frames (Intra): compressed independently like a JPEG image; serve as reference points
    • P-frames (Predicted): encoded as differences from a previous I- or P-frame
    • B-frames (Bidirectional): encoded using both past and future reference frames for the highest compression
  • Powers streaming video, broadcasting, and storage. The MPEG family has evolved through several generations: MPEG-2 (DVDs, broadcast TV), H.264/AVC (most current streaming), and H.265/HEVC (4K content, roughly 50% more efficient than H.264 at equivalent quality).

The Fundamental Trade-Off: Lossy vs. Lossless

Understanding when to use each approach is as important as understanding how they work. The choice depends entirely on your application's requirements for quality, file size, and reconstruction fidelity.

Lossless Compression

  • Guarantees perfect reconstruction. Every original pixel value can be recovered exactly from the compressed file. The mathematical relationship is: decompressedย image=originalย image\text{decompressed image} = \text{original image}, with zero error.
  • Achieves modest compression ratios (typically 2:1 to 3:1 for photographs) because no information is discarded. The theoretical lower bound on lossless compression is set by the data's entropy (from information theory).
  • Essential for medical imaging, satellite imagery, archival, and editing workflows where artifacts would compound with each save cycle (generation loss).

Lossy Compression

  • Achieves dramatic compression ratios (10:1 to 50:1 or higher) by permanently discarding less-perceptible information. At moderate quality settings, JPEG typically achieves around 10:1 with minimal visible degradation.
  • Exploits human visual system limitations. We're less sensitive to high-frequency spatial details and more sensitive to luminance than chrominance. Lossy codecs are specifically designed around these perceptual asymmetries.
  • Appropriate for final delivery (web, streaming, display) where files won't be re-edited. Repeatedly compressing and decompressing a lossy file degrades quality each time.

Compare: Lossless vs. Lossy: this is the central conceptual divide in compression. Lossless (PNG, lossless JPEG 2000) preserves everything but achieves limited compression; lossy (JPEG, standard JPEG 2000) achieves dramatic compression by exploiting perceptual limitations. The exam loves asking: when would you choose each? If diagnostic accuracy, pixel-level analysis, or further editing is needed, go lossless. If you're delivering to human viewers and file size matters, lossy is appropriate.


Quick Reference Table

ConceptBest Examples
Transform-based compressionDCT, Wavelet Transform
Entropy codingHuffman Coding, RLE
Lossy image formatsJPEG, lossy JPEG 2000
Lossless image formatsPNG, lossless JPEG 2000
Exploits spatial redundancyRLE, Vector Quantization
Exploits statistical redundancyHuffman Coding
Exploits self-similarityFractal Compression
Exploits temporal redundancyMPEG (H.264, H.265)
Exploits perceptual redundancyJPEG (chrominance subsampling, quantization)

Self-Check Questions

  1. Both DCT and Wavelet Transform convert images to the frequency domain. What specific advantage does wavelet-based JPEG 2000 have over DCT-based JPEG for images containing text or sharp edges?

  2. You need to compress a medical X-ray for archival storage where diagnostic accuracy is critical. Which compression approach (lossy or lossless) would you choose, and name two specific formats that support it.

  3. Compare and contrast how Huffman Coding and Run-Length Encoding achieve compression. What type of redundancy does each exploit, and in what scenario would RLE dramatically outperform Huffman?

  4. A web developer asks whether to use JPEG or PNG for a company logo that needs to appear over various background colors. Which format should they choose and why? What if they were compressing a photograph instead?

  5. Explain how MPEG video compression builds upon still-image compression techniques like JPEG. What additional type of redundancy does video compression exploit that isn't present in single images?

  6. Walk through the JPEG compression pipeline in order. At which specific step does information loss occur?

Image Compression Techniques to Know for Computer Vision and Image Processing