upgrade
upgrade

๐Ÿ”ขCoding Theory

Encoding Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Encoding techniques form the backbone of reliable digital communicationโ€”they're how we ensure your text message arrives intact, your Spotify stream doesn't glitch, and spacecraft can transmit data across billions of miles of noisy space. You're being tested on understanding why different codes exist, how they achieve error correction, and when each approach makes sense. The core principles hereโ€”redundancy, algebraic structure, iterative decoding, and capacity-approaching performanceโ€”show up repeatedly in exam questions.

Don't just memorize code names and applications. Know what mathematical structure each encoding technique exploits, understand the trade-offs between complexity and performance, and be ready to explain why certain codes dominate specific applications. When an FRQ asks you to recommend an encoding scheme for a given scenario, you need to connect the channel characteristics to the code's strengths.


Block Codes: Fixed-Length Redundancy

Block codes take a message of kk symbols and map it to a codeword of nn symbols, adding structured redundancy. The algebraic relationships between codewords determine how many errors can be detected and corrected.

Linear Block Codes

  • Generator matrix GGโ€”transforms message vectors into codewords through matrix multiplication, creating systematic redundancy
  • Minimum distance dmind_{min} determines error-correcting capability: can correct up to โŒŠ(dminโˆ’1)/2โŒ‹\lfloor(d_{min}-1)/2\rfloor errors
  • Parity-check matrix HH enables syndrome decoding, where the syndrome identifies the error pattern

Hamming Codes

  • Single-error correcting, double-error detecting (SEC-DED)โ€”the minimum distance of 3 makes this the simplest practical error-correcting code
  • Parity bits at power-of-2 positions create unique syndromes that pinpoint the exact location of single-bit errors
  • [2rโˆ’1,2rโˆ’1โˆ’r][2^r - 1, 2^r - 1 - r] structure means a (7,4)(7,4) Hamming code adds only 3 parity bits to protect 4 data bits

Cyclic Codes

  • Polynomial representationโ€”codewords correspond to polynomials divisible by a generator polynomial g(x)g(x), enabling efficient shift-register implementation
  • Cyclic shift property means any rotation of a valid codeword produces another valid codeword, simplifying hardware design
  • CRC checksums are the most common application, providing fast error detection in network protocols and storage systems

Compare: Hamming codes vs. general linear block codesโ€”both use matrix-based encoding, but Hamming codes have a specific structure optimized for single-error correction with minimal redundancy. If asked about lightweight error correction for memory systems, Hamming is your go-to example.


Symbol-Level and Burst Error Correction

Some channels corrupt data in clusters rather than isolated bits. Symbol-based codes treat groups of bits as single units, making them naturally resistant to burst errors.

Reed-Solomon Codes

  • Non-binary operation over GF(2m)GF(2^m)โ€”works on mm-bit symbols rather than individual bits, so a burst error affecting one symbol counts as a single error
  • Corrects up to t=(nโˆ’k)/2t = (n-k)/2 symbol errors, making them dominant in storage media (CDs, DVDs, QR codes) where scratches cause localized damage
  • Systematic encoding places original data symbols at the start of the codeword, with parity symbols appended

Compare: Reed-Solomon vs. Hamming codesโ€”Hamming operates bit-by-bit and handles random single-bit errors efficiently, while Reed-Solomon's symbol-level approach excels against burst errors. An FRQ about optical storage or deep-space communication almost always points to Reed-Solomon.


Convolutional Codes: Memory-Based Encoding

Unlike block codes, convolutional codes process data as a continuous stream, with each output depending on current and previous inputs. The encoder has memory, creating dependencies that spread information across the output sequence.

Convolutional Codes

  • Constraint length KK defines how many input bits influence each output, with longer constraints providing better error correction at higher complexity
  • Viterbi algorithm performs maximum-likelihood decoding by finding the most probable path through a trellis diagram representing all encoder states
  • Rate R=k/nR = k/n specifies bits in versus bits outโ€”common rates like 1/21/2 or 1/31/3 are used in satellite and mobile communications

Trellis-Coded Modulation (TCM)

  • Joint coding and modulationโ€”expands the signal constellation while adding coding, achieving coding gain without bandwidth expansion
  • Trellis diagram represents state transitions, where the decoder searches for the path with minimum Euclidean distance to the received signal
  • Bandwidth-limited channels benefit most, making TCM standard in dial-up modems and early wireless systems

Compare: Convolutional codes vs. linear block codesโ€”block codes process fixed chunks independently, while convolutional codes maintain state across the entire transmission. For real-time streaming applications, convolutional codes' continuous operation is often preferred.


Capacity-Approaching Codes: Near-Optimal Performance

Shannon's channel capacity theorem sets the theoretical limit on reliable communication. Modern codes approach this limit through iterative decoding, where soft information is exchanged between decoders to progressively refine estimates.

Turbo Codes

  • Parallel concatenationโ€”two convolutional encoders process the same data (one interleaved), creating diverse redundancy that iterative decoding exploits
  • Near-Shannon-limit performance was revolutionary in 1993, achieving rates within 0.5 dB of capacity on AWGN channels
  • Iterative decoding exchanges soft information (log-likelihood ratios) between component decoders, improving estimates with each iteration

Low-Density Parity-Check (LDPC) Codes

  • Sparse parity-check matrix HHโ€”few 1s per row and column enable efficient belief propagation decoding on a Tanner graph
  • Capacity-approaching for large block lengths, now standard in Wi-Fi (802.11n/ac/ax), 5G NR, and satellite DVB-S2
  • Parallelizable decoding makes hardware implementation efficient, unlike the sequential nature of turbo decoding

Polar Codes

  • Channel polarizationโ€”recursive construction transforms NN identical channels into channels that are either nearly perfect or nearly useless
  • Provably capacity-achieving for binary-input symmetric channels as block length approaches infinity, the first explicit construction with this property
  • Successive cancellation decoding makes decisions bit-by-bit in a specific order, with list decoding variants improving finite-length performance

Compare: Turbo codes vs. LDPC codesโ€”both approach Shannon capacity through iterative decoding, but LDPC codes offer better parallelization and dominate modern standards. Turbo codes remain important in legacy systems (3G/4G) and scenarios requiring lower latency. Know both for any question about capacity-approaching performance.


Rateless Codes: Adaptive Redundancy

Traditional codes have fixed ratesโ€”you decide redundancy before transmission. Rateless codes generate encoded symbols on-the-fly, stopping only when the receiver confirms successful decoding.

Fountain Codes

  • Potentially unlimited encodingโ€”can generate as many encoded symbols as needed from kk source symbols, adapting to unknown channel conditions
  • Any k(1+ฯต)k(1+\epsilon) symbols suffice for decoding with high probability, regardless of which specific symbols are received
  • Broadcast and multicast scenarios benefit enormouslyโ€”different receivers with different loss rates all eventually decode without feedback coordination

Compare: Fountain codes vs. fixed-rate block codesโ€”block codes require retransmission protocols when errors exceed correction capability, while fountain codes simply send more symbols until decoding succeeds. For streaming to heterogeneous receivers, fountain codes eliminate the need for complex acknowledgment schemes.


Quick Reference Table

ConceptBest Examples
Algebraic block structureLinear block codes, Hamming codes, Cyclic codes
Burst error correctionReed-Solomon codes
Memory-based encodingConvolutional codes, TCM
Iterative capacity-approachingTurbo codes, LDPC codes
Provable capacity achievementPolar codes
Rateless/adaptive redundancyFountain codes
Bandwidth-efficient modulationTrellis-coded modulation
Syndrome-based error locationHamming codes, Linear block codes

Self-Check Questions

  1. Both turbo codes and LDPC codes approach Shannon capacityโ€”what structural feature do they share that enables this, and how do their decoding implementations differ?

  2. You're designing an error correction system for a satellite downlink with unpredictable atmospheric fading. Would you choose Reed-Solomon codes or fountain codes? Justify your answer based on each code's properties.

  3. Explain why Reed-Solomon codes outperform Hamming codes for CD/DVD storage, even though both are block codes with well-defined minimum distances.

  4. Compare convolutional codes and linear block codes in terms of how they process input data. Which would you recommend for a real-time voice communication system, and why?

  5. An FRQ asks you to describe a code that is provably capacity-achieving with an explicit construction. Which code family should you discuss, and what mechanism makes this guarantee possible?