upgrade
upgrade

🔢Coding Theory

Key Concepts of Channel Capacity

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Channel capacity sits at the heart of modern communication theory—it's the fundamental limit that tells us exactly how much information we can push through a noisy channel without errors piling up. You're being tested on your ability to connect entropy, mutual information, and channel models to understand why some communication systems work brilliantly while others fail. These concepts don't just live in textbooks; they're the mathematical backbone of everything from WiFi to deep-space communication.

When exam questions hit, they won't just ask you to recite Shannon's theorem. You'll need to explain why mutual information determines capacity, how different channel models change the capacity formula, and when to apply optimization techniques like water-filling. Don't just memorize formulas—know what concept each one illustrates and how they connect to build a complete picture of reliable communication.


Foundational Information Measures

Before calculating capacity, you need to understand how we quantify information itself. These measures form the mathematical language that makes channel capacity meaningful.

Entropy and Conditional Entropy

  • Entropy H(X)H(X)—measures the average uncertainty or "surprise" in a random variable; the more unpredictable the source, the higher the entropy
  • Conditional entropy H(XY)H(X|Y) quantifies remaining uncertainty about XX after observing YY—essential for understanding how much information actually gets through a channel
  • Both concepts are additive and form the building blocks for mutual information; without them, you can't express what a channel accomplishes

Mutual Information

  • I(X;Y)=H(X)H(XY)I(X;Y) = H(X) - H(X|Y)—measures how much knowing YY reduces uncertainty about XX; this is the information successfully transmitted
  • Symmetric property: I(X;Y)=I(Y;X)I(X;Y) = I(Y;X), meaning it captures shared information regardless of which variable you condition on
  • Directly determines channel capacity—the maximum mutual information over all possible input distributions is the capacity itself

Compare: Entropy vs. Mutual Information—entropy measures uncertainty in a single variable, while mutual information measures the relationship between two variables. If an FRQ asks what quantity you maximize to find capacity, the answer is always mutual information, not entropy.


The Theoretical Foundation

Shannon's work established the rules of the game. These concepts define what's achievable and set the benchmark every practical system aims for.

Shannon's Noisy Channel Coding Theorem

  • Establishes channel capacity CC as the maximum rate for reliable transmission—any rate below CC is achievable with vanishing error probability
  • Existence proof, not construction: Shannon proved good codes exist but didn't tell us how to build them; that challenge drove decades of research
  • The converse is equally important—rates above capacity guarantee errors, no matter how clever your coding scheme

Capacity Formula for Discrete Memoryless Channels

  • C=maxP(X)I(X;Y)C = \max_{P(X)} I(X;Y)—capacity equals the maximum mutual information, optimized over all possible input distributions P(X)P(X)
  • "Memoryless" means independence: each channel use is unaffected by previous transmissions, simplifying analysis dramatically
  • Finding the optimal P(X)P(X) is the key computational challenge; for symmetric channels, uniform input often achieves capacity

Compare: Shannon's Theorem vs. Capacity Formula—the theorem tells you capacity exists and is achievable; the formula tells you how to calculate it. Exam questions often test whether you understand this distinction.


Channel Models

Different physical scenarios require different mathematical models. Each model has its own capacity formula and optimal coding strategy.

Binary Symmetric Channel (BSC)

  • Crossover probability pp defines the chance each bit flips independently—the simplest model for digital transmission errors
  • Capacity formula: C=1H(p)C = 1 - H(p) where H(p)=plog2(p)(1p)log2(1p)H(p) = -p\log_2(p) - (1-p)\log_2(1-p) is the binary entropy function
  • Symmetric structure means uniform input distribution achieves capacity; errors are "fair" in both directions

Binary Erasure Channel (BEC)

  • Erasure probability ϵ\epsilon gives the chance a bit disappears entirely—you know when information is lost, unlike the BSC
  • Capacity: C=1ϵC = 1 - \epsilon—remarkably simple because erasures don't corrupt; they just remove information
  • Easier to correct than BSC at the same loss rate because the receiver knows which bits are missing

Additive White Gaussian Noise Channel (AWGN)

  • Models continuous signals corrupted by Gaussian noise with power spectral density N0/2N_0/2the standard model for wireless and satellite communication
  • Shannon-Hartley theorem: C=Wlog2(1+SN)C = W \log_2(1 + \frac{S}{N}) where WW is bandwidth and S/NS/N is signal-to-noise ratio
  • Continuous input alphabet and power constraints make this fundamentally different from discrete channels

Compare: BSC vs. BEC—both are binary channels, but BSC flips bits (errors look like valid data) while BEC erases them (receiver knows something's missing). BEC is generally easier to decode because uncertainty is localized. This distinction frequently appears in exam questions about decoder design.


Practical Implementation

Theory means nothing without codes that approach capacity in real systems. These concepts bridge the gap between Shannon's limits and working communication systems.

Capacity-Achieving Codes

  • Turbo codes and LDPC codes can approach capacity within fractions of a dB—the breakthrough that made Shannon's theorem practical
  • Block length matters: capacity is only achieved asymptotically as codeword length approaches infinity; short codes pay a penalty
  • Iterative decoding enables these codes to work efficiently; without it, optimal decoding would be computationally impossible

Channel Capacity vs. Data Rate

  • Capacity is theoretical maximum; data rate is what your system actually achieves—always less than or equal to capacity
  • Operating above capacity guarantees failure—error probability cannot be driven to zero regardless of code complexity
  • The gap between rate and capacity represents inefficiency in your code design; modern codes shrink this gap dramatically

Compare: Capacity-Achieving Codes vs. Data Rate—Turbo and LDPC codes achieve rates approaching capacity, but practical systems always operate slightly below. If asked why we can't transmit exactly at capacity, the answer involves finite block lengths and computational constraints.


Optimization and Advanced Scenarios

Real systems face constraints and complexity beyond single-user channels. These concepts handle resource allocation and multi-user scenarios.

Water-Filling Algorithm for Parallel Gaussian Channels

  • Allocates power across subchannels to maximize total capacity—pour more power into better channels, less into noisy ones
  • Threshold behavior: channels below a noise threshold receive zero power; don't waste resources on hopeless subchannels
  • Essential for OFDM systems in WiFi and LTE where frequency-selective fading creates parallel channels with varying quality

Capacity-Cost Function

  • C(P)C(P) relates achievable capacity to resource expenditure—typically power or bandwidth constraints
  • Diminishing returns: doubling power doesn't double capacity; the logarithmic relationship in AWGN capacity shows this clearly
  • Enables system design trade-offs between performance, cost, and efficiency in practical deployments

Capacity Region for Multiple-Access Channels

  • Defines achievable rate pairs (R1,R2)(R_1, R_2) when multiple users share a channel—not a single number but a region in rate space
  • Corner points correspond to time-division strategies; interior points require simultaneous transmission and successive decoding
  • Sum-rate capacity gives the maximum total throughput; individual rates can be traded off within the region

Compare: Water-Filling vs. Capacity Region—water-filling optimizes a single user across parallel channels, while capacity regions handle multiple users on a shared channel. Both involve optimization, but the constraints and trade-offs differ fundamentally.


Quick Reference Table

ConceptBest Examples
Information measuresEntropy, Conditional Entropy, Mutual Information
Fundamental theoremsShannon's Noisy Channel Coding Theorem, Capacity Formula
Discrete channel modelsBSC, BEC
Continuous channel modelsAWGN, Parallel Gaussian Channels
Capacity-achieving techniquesTurbo Codes, LDPC Codes
Optimization methodsWater-Filling Algorithm, Capacity-Cost Function
Multi-user scenariosCapacity Region, Multiple-Access Channels
Key trade-offsCapacity vs. Data Rate, Power vs. Capacity

Self-Check Questions

  1. Both BSC and BEC model binary transmission errors—what fundamental difference makes BEC easier to decode, and how does this affect their capacity formulas?

  2. If you're given a channel's transition probabilities, what quantity do you maximize to find capacity, and over what set do you optimize?

  3. Compare Turbo codes and the Shannon limit: why can't practical codes achieve capacity exactly, and what parameter must increase to get arbitrarily close?

  4. The water-filling algorithm assigns zero power to some subchannels. Under what condition does a subchannel receive no power, and why is this optimal?

  5. An FRQ asks you to explain why transmitting above channel capacity is fundamentally impossible. What theorem establishes this, and what does its converse guarantee about rates below capacity?