Why This Matters
Channel capacity sits at the heart of modern communication theory—it's the fundamental limit that tells us exactly how much information we can push through a noisy channel without errors piling up. You're being tested on your ability to connect entropy, mutual information, and channel models to understand why some communication systems work brilliantly while others fail. These concepts don't just live in textbooks; they're the mathematical backbone of everything from WiFi to deep-space communication.
When exam questions hit, they won't just ask you to recite Shannon's theorem. You'll need to explain why mutual information determines capacity, how different channel models change the capacity formula, and when to apply optimization techniques like water-filling. Don't just memorize formulas—know what concept each one illustrates and how they connect to build a complete picture of reliable communication.
Before calculating capacity, you need to understand how we quantify information itself. These measures form the mathematical language that makes channel capacity meaningful.
Entropy and Conditional Entropy
- Entropy H(X)—measures the average uncertainty or "surprise" in a random variable; the more unpredictable the source, the higher the entropy
- Conditional entropy H(X∣Y) quantifies remaining uncertainty about X after observing Y—essential for understanding how much information actually gets through a channel
- Both concepts are additive and form the building blocks for mutual information; without them, you can't express what a channel accomplishes
- I(X;Y)=H(X)−H(X∣Y)—measures how much knowing Y reduces uncertainty about X; this is the information successfully transmitted
- Symmetric property: I(X;Y)=I(Y;X), meaning it captures shared information regardless of which variable you condition on
- Directly determines channel capacity—the maximum mutual information over all possible input distributions is the capacity itself
Compare: Entropy vs. Mutual Information—entropy measures uncertainty in a single variable, while mutual information measures the relationship between two variables. If an FRQ asks what quantity you maximize to find capacity, the answer is always mutual information, not entropy.
The Theoretical Foundation
Shannon's work established the rules of the game. These concepts define what's achievable and set the benchmark every practical system aims for.
Shannon's Noisy Channel Coding Theorem
- Establishes channel capacity C as the maximum rate for reliable transmission—any rate below C is achievable with vanishing error probability
- Existence proof, not construction: Shannon proved good codes exist but didn't tell us how to build them; that challenge drove decades of research
- The converse is equally important—rates above capacity guarantee errors, no matter how clever your coding scheme
- C=maxP(X)I(X;Y)—capacity equals the maximum mutual information, optimized over all possible input distributions P(X)
- "Memoryless" means independence: each channel use is unaffected by previous transmissions, simplifying analysis dramatically
- Finding the optimal P(X) is the key computational challenge; for symmetric channels, uniform input often achieves capacity
Compare: Shannon's Theorem vs. Capacity Formula—the theorem tells you capacity exists and is achievable; the formula tells you how to calculate it. Exam questions often test whether you understand this distinction.
Channel Models
Different physical scenarios require different mathematical models. Each model has its own capacity formula and optimal coding strategy.
Binary Symmetric Channel (BSC)
- Crossover probability p defines the chance each bit flips independently—the simplest model for digital transmission errors
- Capacity formula: C=1−H(p) where H(p)=−plog2(p)−(1−p)log2(1−p) is the binary entropy function
- Symmetric structure means uniform input distribution achieves capacity; errors are "fair" in both directions
Binary Erasure Channel (BEC)
- Erasure probability ϵ gives the chance a bit disappears entirely—you know when information is lost, unlike the BSC
- Capacity: C=1−ϵ—remarkably simple because erasures don't corrupt; they just remove information
- Easier to correct than BSC at the same loss rate because the receiver knows which bits are missing
Additive White Gaussian Noise Channel (AWGN)
- Models continuous signals corrupted by Gaussian noise with power spectral density N0/2—the standard model for wireless and satellite communication
- Shannon-Hartley theorem: C=Wlog2(1+NS) where W is bandwidth and S/N is signal-to-noise ratio
- Continuous input alphabet and power constraints make this fundamentally different from discrete channels
Compare: BSC vs. BEC—both are binary channels, but BSC flips bits (errors look like valid data) while BEC erases them (receiver knows something's missing). BEC is generally easier to decode because uncertainty is localized. This distinction frequently appears in exam questions about decoder design.
Practical Implementation
Theory means nothing without codes that approach capacity in real systems. These concepts bridge the gap between Shannon's limits and working communication systems.
Capacity-Achieving Codes
- Turbo codes and LDPC codes can approach capacity within fractions of a dB—the breakthrough that made Shannon's theorem practical
- Block length matters: capacity is only achieved asymptotically as codeword length approaches infinity; short codes pay a penalty
- Iterative decoding enables these codes to work efficiently; without it, optimal decoding would be computationally impossible
Channel Capacity vs. Data Rate
- Capacity is theoretical maximum; data rate is what your system actually achieves—always less than or equal to capacity
- Operating above capacity guarantees failure—error probability cannot be driven to zero regardless of code complexity
- The gap between rate and capacity represents inefficiency in your code design; modern codes shrink this gap dramatically
Compare: Capacity-Achieving Codes vs. Data Rate—Turbo and LDPC codes achieve rates approaching capacity, but practical systems always operate slightly below. If asked why we can't transmit exactly at capacity, the answer involves finite block lengths and computational constraints.
Optimization and Advanced Scenarios
Real systems face constraints and complexity beyond single-user channels. These concepts handle resource allocation and multi-user scenarios.
Water-Filling Algorithm for Parallel Gaussian Channels
- Allocates power across subchannels to maximize total capacity—pour more power into better channels, less into noisy ones
- Threshold behavior: channels below a noise threshold receive zero power; don't waste resources on hopeless subchannels
- Essential for OFDM systems in WiFi and LTE where frequency-selective fading creates parallel channels with varying quality
Capacity-Cost Function
- C(P) relates achievable capacity to resource expenditure—typically power or bandwidth constraints
- Diminishing returns: doubling power doesn't double capacity; the logarithmic relationship in AWGN capacity shows this clearly
- Enables system design trade-offs between performance, cost, and efficiency in practical deployments
Capacity Region for Multiple-Access Channels
- Defines achievable rate pairs (R1,R2) when multiple users share a channel—not a single number but a region in rate space
- Corner points correspond to time-division strategies; interior points require simultaneous transmission and successive decoding
- Sum-rate capacity gives the maximum total throughput; individual rates can be traded off within the region
Compare: Water-Filling vs. Capacity Region—water-filling optimizes a single user across parallel channels, while capacity regions handle multiple users on a shared channel. Both involve optimization, but the constraints and trade-offs differ fundamentally.
Quick Reference Table
|
| Information measures | Entropy, Conditional Entropy, Mutual Information |
| Fundamental theorems | Shannon's Noisy Channel Coding Theorem, Capacity Formula |
| Discrete channel models | BSC, BEC |
| Continuous channel models | AWGN, Parallel Gaussian Channels |
| Capacity-achieving techniques | Turbo Codes, LDPC Codes |
| Optimization methods | Water-Filling Algorithm, Capacity-Cost Function |
| Multi-user scenarios | Capacity Region, Multiple-Access Channels |
| Key trade-offs | Capacity vs. Data Rate, Power vs. Capacity |
Self-Check Questions
-
Both BSC and BEC model binary transmission errors—what fundamental difference makes BEC easier to decode, and how does this affect their capacity formulas?
-
If you're given a channel's transition probabilities, what quantity do you maximize to find capacity, and over what set do you optimize?
-
Compare Turbo codes and the Shannon limit: why can't practical codes achieve capacity exactly, and what parameter must increase to get arbitrarily close?
-
The water-filling algorithm assigns zero power to some subchannels. Under what condition does a subchannel receive no power, and why is this optimal?
-
An FRQ asks you to explain why transmitting above channel capacity is fundamentally impossible. What theorem establishes this, and what does its converse guarantee about rates below capacity?