🔢Coding Theory

Encoding Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Encoding techniques form the backbone of reliable digital communication—they're how we ensure your text message arrives intact, your Spotify stream doesn't glitch, and spacecraft can transmit data across billions of miles of noisy space. You're being tested on understanding why different codes exist, how they achieve error correction, and when each approach makes sense. The core principles here—redundancy, algebraic structure, iterative decoding, and capacity-approaching performance—show up repeatedly in exam questions.

Don't just memorize code names and applications. Know what mathematical structure each encoding technique exploits, understand the trade-offs between complexity and performance, and be ready to explain why certain codes dominate specific applications. When an FRQ asks you to recommend an encoding scheme for a given scenario, you need to connect the channel characteristics to the code's strengths.

Block Codes: Fixed-Length Redundancy

Block codes take a message of $k$ symbols and map it to a codeword of $n$ symbols, adding structured redundancy. The algebraic relationships between codewords determine how many errors can be detected and corrected.

Linear Block Codes

Generator matrix $G$ —transforms message vectors into codewords through matrix multiplication, creating systematic redundancy
Minimum distance $d_{min}$ determines error-correcting capability: can correct up to $\lfloor(d_{min}-1)/2\rfloor$ errors
Parity-check matrix $H$ enables syndrome decoding, where the syndrome identifies the error pattern

Hamming Codes

Single-error correcting, double-error detecting (SEC-DED)—the minimum distance of 3 makes this the simplest practical error-correcting code
Parity bits at power-of-2 positions create unique syndromes that pinpoint the exact location of single-bit errors
$[2^r - 1, 2^r - 1 - r]$ structure means a $(7,4)$ Hamming code adds only 3 parity bits to protect 4 data bits

Cyclic Codes

Polynomial representation—codewords correspond to polynomials divisible by a generator polynomial $g(x)$ , enabling efficient shift-register implementation
Cyclic shift property means any rotation of a valid codeword produces another valid codeword, simplifying hardware design
CRC checksums are the most common application, providing fast error detection in network protocols and storage systems

Compare: Hamming codes vs. general linear block codes—both use matrix-based encoding, but Hamming codes have a specific structure optimized for single-error correction with minimal redundancy. If asked about lightweight error correction for memory systems, Hamming is your go-to example.

Symbol-Level and Burst Error Correction

Some channels corrupt data in clusters rather than isolated bits. Symbol-based codes treat groups of bits as single units, making them naturally resistant to burst errors.

Reed-Solomon Codes

Non-binary operation over $GF(2^m)$ —works on $m$ -bit symbols rather than individual bits, so a burst error affecting one symbol counts as a single error
Corrects up to $t = (n-k)/2$ symbol errors, making them dominant in storage media (CDs, DVDs, QR codes) where scratches cause localized damage
Systematic encoding places original data symbols at the start of the codeword, with parity symbols appended

Compare: Reed-Solomon vs. Hamming codes—Hamming operates bit-by-bit and handles random single-bit errors efficiently, while Reed-Solomon's symbol-level approach excels against burst errors. An FRQ about optical storage or deep-space communication almost always points to Reed-Solomon.

Convolutional Codes: Memory-Based Encoding

Unlike block codes, convolutional codes process data as a continuous stream, with each output depending on current and previous inputs. The encoder has memory, creating dependencies that spread information across the output sequence.

Convolutional Codes

Constraint length $K$ defines how many input bits influence each output, with longer constraints providing better error correction at higher complexity
Viterbi algorithm performs maximum-likelihood decoding by finding the most probable path through a trellis diagram representing all encoder states
Rate $R = k/n$ specifies bits in versus bits out—common rates like $1/2$ or $1/3$ are used in satellite and mobile communications

Trellis-Coded Modulation (TCM)

Joint coding and modulation—expands the signal constellation while adding coding, achieving coding gain without bandwidth expansion
Trellis diagram represents state transitions, where the decoder searches for the path with minimum Euclidean distance to the received signal
Bandwidth-limited channels benefit most, making TCM standard in dial-up modems and early wireless systems

Compare: Convolutional codes vs. linear block codes—block codes process fixed chunks independently, while convolutional codes maintain state across the entire transmission. For real-time streaming applications, convolutional codes' continuous operation is often preferred.

Capacity-Approaching Codes: Near-Optimal Performance

Shannon's channel capacity theorem sets the theoretical limit on reliable communication. Modern codes approach this limit through iterative decoding, where soft information is exchanged between decoders to progressively refine estimates.

Turbo Codes

Parallel concatenation—two convolutional encoders process the same data (one interleaved), creating diverse redundancy that iterative decoding exploits
Near-Shannon-limit performance was revolutionary in 1993, achieving rates within 0.5 dB of capacity on AWGN channels
Iterative decoding exchanges soft information (log-likelihood ratios) between component decoders, improving estimates with each iteration

Low-Density Parity-Check (LDPC) Codes

Sparse parity-check matrix $H$ —few 1s per row and column enable efficient belief propagation decoding on a Tanner graph
Capacity-approaching for large block lengths, now standard in Wi-Fi (802.11n/ac/ax), 5G NR, and satellite DVB-S2
Parallelizable decoding makes hardware implementation efficient, unlike the sequential nature of turbo decoding

Polar Codes

Channel polarization—recursive construction transforms $N$ identical channels into channels that are either nearly perfect or nearly useless
Provably capacity-achieving for binary-input symmetric channels as block length approaches infinity, the first explicit construction with this property
Successive cancellation decoding makes decisions bit-by-bit in a specific order, with list decoding variants improving finite-length performance

Compare: Turbo codes vs. LDPC codes—both approach Shannon capacity through iterative decoding, but LDPC codes offer better parallelization and dominate modern standards. Turbo codes remain important in legacy systems (3G/4G) and scenarios requiring lower latency. Know both for any question about capacity-approaching performance.

Rateless Codes: Adaptive Redundancy

Traditional codes have fixed rates—you decide redundancy before transmission. Rateless codes generate encoded symbols on-the-fly, stopping only when the receiver confirms successful decoding.

Fountain Codes

Potentially unlimited encoding—can generate as many encoded symbols as needed from $k$ source symbols, adapting to unknown channel conditions
Any $k(1+\epsilon)$ symbols suffice for decoding with high probability, regardless of which specific symbols are received
Broadcast and multicast scenarios benefit enormously—different receivers with different loss rates all eventually decode without feedback coordination

Compare: Fountain codes vs. fixed-rate block codes—block codes require retransmission protocols when errors exceed correction capability, while fountain codes simply send more symbols until decoding succeeds. For streaming to heterogeneous receivers, fountain codes eliminate the need for complex acknowledgment schemes.

Quick Reference Table

Concept	Best Examples
Algebraic block structure	Linear block codes, Hamming codes, Cyclic codes
Burst error correction	Reed-Solomon codes
Memory-based encoding	Convolutional codes, TCM
Iterative capacity-approaching	Turbo codes, LDPC codes
Provable capacity achievement	Polar codes
Rateless/adaptive redundancy	Fountain codes
Bandwidth-efficient modulation	Trellis-coded modulation
Syndrome-based error location	Hamming codes, Linear block codes

Self-Check Questions

Both turbo codes and LDPC codes approach Shannon capacity—what structural feature do they share that enables this, and how do their decoding implementations differ?
You're designing an error correction system for a satellite downlink with unpredictable atmospheric fading. Would you choose Reed-Solomon codes or fountain codes? Justify your answer based on each code's properties.
Explain why Reed-Solomon codes outperform Hamming codes for CD/DVD storage, even though both are block codes with well-defined minimum distances.
Compare convolutional codes and linear block codes in terms of how they process input data. Which would you recommend for a real-time voice communication system, and why?
An FRQ asks you to describe a code that is provably capacity-achieving with an explicit construction. Which code family should you discuss, and what mechanism makes this guarantee possible?

🔢Coding Theory

Encoding Techniques

Why This Matters

Block Codes: Fixed-Length Redundancy

Linear Block Codes

Hamming Codes

Cyclic Codes

Symbol-Level and Burst Error Correction

Reed-Solomon Codes

Convolutional Codes: Memory-Based Encoding

Convolutional Codes

Trellis-Coded Modulation (TCM)

Capacity-Approaching Codes: Near-Optimal Performance

Turbo Codes

Low-Density Parity-Check (LDPC) Codes

Polar Codes

Rateless Codes: Adaptive Redundancy

Fountain Codes

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes