Multimedia Data Types
Multimedia data like audio and video places unique demands on computer networks. The combination of high data rates, strict timing requirements, and user expectations for quality makes multimedia one of the hardest types of traffic for networks to handle well. This section covers what makes multimedia data different, why it's hard to transmit, and the key techniques networks use to cope.
Characteristics of Multimedia Data
Audio data represents sound waves as digital samples. Three properties determine its quality and size:
- Sampling rate controls how many times per second the sound wave is measured. CD-quality audio uses 44.1 kHz (44,100 samples per second). Higher rates capture more detail but produce more data.
- Bit depth determines the dynamic range (the gap between the quietest and loudest sounds the format can represent). CD audio uses 16-bit; professional audio often uses 24-bit.
- Channels refer to how many independent audio signals are included. Mono has one channel, stereo has two, and surround sound can have six or more.
Video data is a sequence of images (frames) displayed at a specific rate, typically 24, 30, or 60 frames per second. Each frame is a grid of pixels, and each pixel stores color and brightness information. Two key properties define a video frame's size:
- Resolution is the number of pixels per frame. Full HD is 1920×1080; 4K is 3840×2160.
- Color depth specifies how many bits represent each pixel's color. 24-bit color (8 bits per channel for red, green, blue) is standard; 30-bit or higher is used in HDR content.
Uncompressed data rates are enormous. This is the core reason multimedia networking is hard:
- Uncompressed CD-quality stereo audio (44.1 kHz, 16-bit, 2 channels) requires about 1.4 Mbps.
- Uncompressed Full HD video (1920×1080, 30 fps, 24-bit color) requires roughly 1.5 Gbps.
You can verify these yourself. For the audio example: . For video: . These numbers make it clear that sending raw multimedia over most networks is simply not feasible.

Multimedia Networking Challenges and Techniques

Challenges in Multimedia Transmission
Four main problems arise when you try to send multimedia over a packet-switched network:
- Bandwidth limitations. Most network links can't carry uncompressed high-quality multimedia in real time. Even compressed HD video typically needs 5–20 Mbps, which can exceed available capacity on congested or low-bandwidth links.
- Packet loss. When the network is congested or errors occur, packets get dropped. In multimedia, lost packets mean missing audio samples or video frames, which directly degrades what the user sees or hears.
- Latency (delay). The time between when data is sent and when it arrives matters a lot for real-time applications. A video call with 500 ms of one-way delay feels unnatural and frustrating. For stored streaming, latency is less critical but still affects startup time.
- Jitter (variation in delay). Even if average delay is acceptable, inconsistent arrival times cause problems. If packets arrive at irregular intervals, audio can stutter and video can freeze unless something smooths out the variation.
Techniques for Multimedia Networking
Networks use three main techniques to handle these challenges:
Compression reduces the amount of data that needs to be transmitted. There are several approaches:
- Lossless compression (e.g., FLAC for audio, PNG for images) removes statistical redundancy without discarding any information. The original data can be perfectly reconstructed. Compression ratios are modest, typically 2:1 to 3:1.
- Lossy compression (e.g., MP3 for audio, JPEG for images) removes both redundancy and perceptually less important detail. This achieves much higher compression ratios (MP3 can compress audio roughly 10:1) at the cost of some quality loss.
- Video codecs like H.264 and VP9 combine two strategies: intra-frame compression reduces redundancy within a single frame (similar to JPEG), while inter-frame compression encodes only the differences between consecutive frames. Since adjacent frames in a video are usually very similar, inter-frame compression is extremely effective.
Buffering temporarily stores received data before playing it back. Here's why it helps:
- The client begins receiving data but delays playback for a short period (often a few seconds).
- During this buffer-fill period, a reserve of data accumulates.
- When playback starts, short-term jitter or brief network disruptions can be absorbed because the player draws from the buffer rather than needing each packet to arrive at exactly the right moment.
The tradeoff is that buffering introduces a startup delay. For stored content this is usually acceptable; for real-time interactive applications, large buffers are not an option.
Adaptive bitrate streaming (ABR) dynamically adjusts video quality based on current network conditions:
- The server encodes the content into multiple versions at different quality levels (bitrates).
- Each version is split into short segments, typically 2–10 seconds long.
- The client monitors its available bandwidth and buffer level, then requests the appropriate quality segment for each interval.
- If bandwidth drops, the client switches to a lower-quality segment. If bandwidth improves, it switches back up.
This approach avoids rebuffering (playback stalls) by gracefully degrading quality when the network can't keep up. Protocols like HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) implement this pattern.
Streaming vs. Interactive Multimedia Requirements
These two categories of multimedia have fundamentally different network requirements:
Stored streaming (e.g., YouTube, Netflix) delivers pre-recorded content from a server. Because the full content already exists, the system can buffer aggressively and adapt quality over time. Some latency and jitter are tolerable since the user doesn't need instant feedback. The main goal is smooth, uninterrupted playback.
Real-time interactive multimedia (e.g., Zoom video calls, online gaming, Twitch live streams) generates content as it's consumed. Latency must be low (under ~150 ms one-way for comfortable conversation), and jitter must be minimal. There's very little room for buffering because the content doesn't exist yet. This demands higher and more consistent network performance than stored streaming.
The distinction matters for system design: stored streaming can rely heavily on buffering and ABR to mask network problems, while interactive multimedia needs the network itself to provide low, stable delay.