Quality of Service (QoS) is how networks guarantee that different types of traffic get the treatment they need. Without QoS, a router treats a surgeon's telemedicine video feed the same as a background file download, and that's a problem. QoS architectures give network administrators the tools to prioritize, reserve, and allocate resources so that delay-sensitive and mission-critical applications perform reliably even under congestion.

Quality of Service Fundamentals

Networks carry many types of traffic simultaneously, and each type has different requirements. A VoIP call needs low latency and minimal jitter, while an email can tolerate seconds of delay without anyone noticing. QoS enables networks to provide differentiated services to these different traffic classes.

At its core, QoS does two things:

Prioritizes traffic based on sensitivity. High-priority traffic (voice, video conferencing) receives preferential treatment over lower-priority traffic (file transfers, email). This prioritization accounts for each application's tolerance for delay, jitter, and packet loss.
Allocates resources to match demand. Rather than letting all traffic compete equally for bandwidth, QoS mechanisms reserve or weight resources so that critical applications get what they need. A hospital's telemedicine session, for example, can be guaranteed bandwidth that a background web-browsing session cannot claim.

The end goal is maintaining quality of experience for users, particularly for real-time applications where even small degradations (a few hundred milliseconds of added latency, or a spike in packet loss) are immediately noticeable.

Quality of Service fundamentals, 5G Quality of Service

IntServ vs. DiffServ Architectures

Two major QoS architectures exist, and they represent fundamentally different design philosophies.

Integrated Services (IntServ) provides QoS on a per-flow basis. Each flow is identified by its source/destination addresses, port numbers, and protocol. Before data is sent, IntServ uses the Resource Reservation Protocol (RSVP) to reserve resources (bandwidth, buffer space) at every router along the path.

Every router along the path must maintain state information for each individual flow.
This gives very fine-grained control: you can guarantee exact bandwidth and delay bounds for a specific video call between two endpoints.
The tradeoff is scalability. A backbone router handling millions of flows would need to track state for each one, which becomes impractical in large networks.

Differentiated Services (DiffServ) provides QoS on a per-class basis instead of per-flow. Rather than reserving resources for each individual session, DiffServ groups traffic into a manageable number of classes (e.g., voice, video, best-effort) and treats all packets in a class the same way.

Edge routers classify incoming packets and mark them by setting the Differentiated Services Code Point (DSCP) field in the IP header (a 6-bit field allowing up to 64 classes).
Core routers don't need to inspect flows individually. They simply read the DSCP value and apply the corresponding forwarding behavior (called a Per-Hop Behavior, or PHB).
Because core routers only maintain per-class state rather than per-flow state, DiffServ scales far better to large networks.

Key comparison: IntServ gives precise, per-flow guarantees but doesn't scale. DiffServ scales well across large networks but offers coarser, class-level treatment rather than per-flow guarantees. Most real-world large networks use DiffServ, sometimes with IntServ-like signaling at the edges.

Service Level Agreements in QoS

A Service Level Agreement (SLA) is a contract between a service provider and a customer that specifies the expected level of service. SLAs translate QoS from a technical mechanism into a business commitment.

An SLA typically defines:

Measurable metrics such as availability (e.g., 99.99% uptime), throughput (e.g., guaranteed 100 Mbps), latency (e.g., ≤ 50 ms one-way), and packet loss (e.g., < 0.1%).
Consequences for violations, which can include financial penalties, service credits, or contract termination rights.

SLAs matter for QoS because they establish a shared understanding between provider and customer about what "good enough" means. They also provide the framework for monitoring and enforcement: if the SLA promises ≤ 50 ms latency for voice traffic, the provider's QoS mechanisms must be configured to deliver that, and both parties can measure whether it's being met.

Mechanisms of QoS Implementation

QoS is implemented through a pipeline of mechanisms, each handling a different stage of traffic treatment.

1. Classification identifies and categorizes packets based on predefined criteria (source/destination addresses, port numbers, protocol type, or even application-layer signatures). This is the first step: you can't treat traffic differently until you know what kind of traffic it is.

2. Marking sets the DSCP field in the IP header to indicate each packet's class of service. Once a packet is marked at the network edge, downstream routers can quickly determine the appropriate QoS treatment without re-inspecting the packet's contents. This is what makes DiffServ efficient in the core.

3. Queuing stores packets in buffers before forwarding them to the next hop. The queuing algorithm determines which packets get sent first and how bandwidth is shared. The main algorithms are:

First-In-First-Out (FIFO): Forwards packets strictly in arrival order. Simple but offers no differentiation between traffic types.
Priority Queuing (PQ): Serves all packets in the highest-priority queue before moving to lower-priority queues. Great for voice traffic, but low-priority traffic can starve if high-priority traffic is heavy.
Weighted Fair Queuing (WFQ): Allocates bandwidth proportionally to each flow based on assigned weights. Prevents starvation while still giving more bandwidth to higher-weight flows.
Class-Based Weighted Fair Queuing (CBWFQ): Similar to WFQ but operates on traffic classes rather than individual flows. Administrators configure bandwidth guarantees per class (e.g., 30% for voice, 40% for video, 30% for best-effort).

Classification and marking happen primarily at the network edge. Queuing and scheduling happen at every hop where congestion can occur. This edge-core division of labor is central to how DiffServ keeps core routers simple and fast.