Queueing models provide the mathematical framework for analyzing systems where customers arrive, wait for service, and depart. They let you predict performance metrics and make informed decisions about capacity planning and resource allocation.

This section covers the building blocks: arrival processes, service time distributions, standard notation, birth-death process analysis, and the core single-server and multi-server models.

Arrival processes in queueing models

Arrival processes describe the pattern and rate at which customers (or jobs, packets, calls, etc.) enter a queueing system. Choosing the right arrival model is the first step in any queueing analysis, because everything downstream depends on it.

Poisson process for arrivals

The Poisson process is the default model for arrivals in most basic queueing systems. It assumes:

Arrivals occur independently of one another
Arrivals happen at a constant average rate $\lambda$ (arrivals per unit time)
Inter-arrival times follow an exponential distribution with parameter $\lambda$

The exponential distribution has the memoryless property: the time until the next arrival doesn't depend on how long it's been since the last one. Mathematically, $P(T > t + s \mid T > s) = P(T > t)$ .

Poisson arrivals work well when customers arrive randomly and independently, such as at call centers, web servers, or emergency rooms.

Batch arrivals

Sometimes customers arrive in groups rather than one at a time. Think of families arriving at a restaurant or a batch of jobs submitted to a computing cluster.

The batch sizes can be fixed or random (following a geometric, Poisson, or other distribution)
Batch arrival processes are typically modeled as compound Poisson processes: batches arrive according to a Poisson process, and each batch has a random size
Analysis requires tracking both the batch arrival rate and the batch size distribution, since both affect congestion

Time-dependent arrival rates

In many real systems, the arrival rate changes over time. Rush-hour traffic, peak call center hours, and seasonal demand all exhibit this behavior.

Modeled using a non-homogeneous Poisson process, where the arrival rate $\lambda(t)$ is a function of time
The expected number of arrivals in an interval $[t_1, t_2]$ is $\int_{t_1}^{t_2} \lambda(t) \, dt$
These systems are harder to analyze because steady-state results may not apply directly; time-varying simulations or piecewise-stationary approximations are common approaches

Service time distributions

Service time distributions describe how long it takes to serve each customer. The choice of distribution shapes the entire queueing model and determines which analytical tools you can use.

Exponential service times

Exponential service times (parameter $\mu$ , so the mean service time is $1/\mu$ ) are the most common assumption in basic models. The key reason: the memoryless property makes the math tractable.

The remaining service time doesn't depend on how long service has already been going on
This fits situations where service durations are highly variable and unpredictable
Models with exponential service times (M/M/1, M/M/c) yield clean, closed-form solutions

General service time distributions

When exponential service times aren't realistic, you can use a general distribution, denoted G in Kendall's notation.

The service time can follow any distribution: Erlang, hyperexponential, lognormal, Weibull, etc.
General distributions can capture features like low variability (Erlang), high variability (hyperexponential), or heavy tails (lognormal)
The M/G/1 model handles Poisson arrivals with general service times, analyzed via the Pollaczek-Khinchine formula
Beyond M/G/1, approximations and numerical methods are often necessary

Deterministic service times

Deterministic service times mean every customer takes exactly the same amount of time: $D = 1/\mu$ .

Applicable to automated systems, assembly lines, or any process with negligible variation
The M/D/1 queue has lower average waiting times than M/M/1 at the same traffic intensity, because removing service time variability always helps
This is a special case of M/G/1 where the variance of service time is zero

Notation and terminology

Kendall's notation

Kendall's notation is a compact way to specify a queueing model. The full form is A/S/c/K/N/D:

Symbol	Meaning	Common values
A	Arrival process	M (Poisson), G (general), D (deterministic)
S	Service time distribution	M (exponential), G (general), D (deterministic)
c	Number of servers	1, 2, ..., c
K	System capacity	Finite integer, or omitted if infinite
N	Calling population size	Omitted if infinite
D	Queue discipline	FCFS, LCFS, priority; omitted if FCFS

When the last three parameters are omitted, the defaults are $K = \infty$ , $N = \infty$ , and FCFS discipline. So "M/M/1" means Poisson arrivals, exponential service, one server, infinite capacity, infinite population, FCFS.

Traffic intensity and stability

Traffic intensity $\rho$ measures how heavily loaded the system is:

$\rho = \frac{\lambda}{c\mu}$

where $\lambda$ is the arrival rate, $\mu$ is the per-server service rate, and $c$ is the number of servers.

If $\rho < 1$ , the system is stable: the queue won't grow without bound
If $\rho \geq 1$ , the system is unstable: arrivals come in faster than they can be served, and the queue grows indefinitely

For finite-capacity queues (like M/M/1/K), the system doesn't blow up even when $\rho \geq 1$ , because excess arrivals are simply blocked. But for infinite-capacity queues, $\rho < 1$ is a hard requirement for steady-state analysis.

Little's law

Little's law is one of the most powerful results in queueing theory:

$L = \lambda W$

$L$ = average number of customers in the system
$\lambda$ = average arrival rate (for finite-capacity systems, use the effective arrival rate)
$W$ = average time a customer spends in the system

This holds for any stable queueing system, regardless of the arrival process, service distribution, number of servers, or queue discipline. It also applies to subsystems: $L_q = \lambda W_q$ relates the average queue length to the average waiting time in the queue.

If you know any two of the three quantities, you can find the third. This makes Little's law extremely useful for quick calculations and sanity checks.

Birth-death processes

Birth-death processes are continuous-time Markov chains where the state represents the number of customers in the system. Transitions only happen between adjacent states: the state increases by 1 (a "birth," i.e., an arrival) or decreases by 1 (a "death," i.e., a service completion).

Balance equations

To find the steady-state distribution, you set up balance equations that equate the rate of flow into each state with the rate of flow out.

For a birth-death process with states $\{0, 1, 2, \ldots\}$ , birth rates $\lambda_n$ , and death rates $\mu_n$ :

State 0: $\lambda_0 P_0 = \mu_1 P_1$
State n (for $n \geq 1$ ): $(\lambda_n + \mu_n) P_n = \lambda_{n-1} P_{n-1} + \mu_{n+1} P_{n+1}$

A cleaner approach uses the detailed balance (level-crossing) equations. Equating flow across the boundary between states $n$ and $n+1$ :

$\lambda_n P_n = \mu_{n+1} P_{n+1}, \quad n = 0, 1, 2, \ldots$

This gives a recursive formula: $P_{n+1} = \frac{\lambda_n}{\mu_{n+1}} P_n$ , which you solve iteratively starting from $P_0$ .

Poisson process for arrivals, Probability distribution - wikidoc

Steady-state probabilities

Iterating the recursion yields:

$P_n = P_0 \prod_{k=0}^{n-1} \frac{\lambda_k}{\mu_{k+1}}$

You then determine $P_0$ from the normalization condition:

$\sum_{n=0}^{\infty} P_n = 1$

These probabilities tell you the long-run fraction of time the system has exactly $n$ customers. They exist only if the sum converges, which ties back to the stability condition.

Performance measures

Once you have the steady-state probabilities, you can compute all the standard performance measures:

$L$ (avg. customers in system): $L = \sum_{n=0}^{\infty} n \, P_n$
$L_q$ (avg. customers in queue): $L_q = \sum_{n=c}^{\infty} (n - c) \, P_n$ for a $c$ -server system
$W$ (avg. time in system): from Little's law, $W = L / \lambda$
$W_q$ (avg. waiting time in queue): $W_q = L_q / \lambda$
$P_0$ (probability system is empty)
$P_K$ (blocking probability, for finite-capacity systems)

These measures drive practical decisions about staffing, buffer sizing, and service level agreements.

Single-server models

Single-server models are the simplest queueing systems and the foundation for everything more complex. One server processes customers one at a time, with a queue forming when the server is busy.

M/M/1 queue

The M/M/1 queue has Poisson arrivals (rate $\lambda$ ), exponential service (rate $\mu$ ), and one server. It's the most studied queueing model.

Stability condition: $\rho = \lambda / \mu < 1$

Steady-state probabilities: $P_n = (1 - \rho)\rho^n, \quad n = 0, 1, 2, \ldots$

This is a geometric distribution. The probability of finding $n$ customers in the system decays exponentially with $n$ .

Performance measures:

Measure	Formula
Avg. customers in system ( $L$ )	$\frac{\rho}{1 - \rho}$
Avg. customers in queue ( $L_q$ )	$\frac{\rho^2}{1 - \rho}$
Avg. time in system ( $W$ )	$\frac{1}{\mu - \lambda}$
Avg. waiting time in queue ( $W_q$ )	$\frac{\rho}{\mu - \lambda}$

Notice how all these measures blow up as $\rho \to 1$ . Even at $\rho = 0.9$ , the average queue length is $0.81/0.1 = 8.1$ customers. At $\rho = 0.5$ , it's only $0.25/0.5 = 0.5$ . This nonlinear sensitivity to utilization is one of the most important takeaways from queueing theory.

M/G/1 queue

The M/G/1 queue generalizes M/M/1 by allowing any service time distribution. Arrivals are still Poisson (rate $\lambda$ ), but the service time $S$ can have any distribution with mean $E[S] = 1/\mu$ and second moment $E[S^2]$ .

The key result is the Pollaczek-Khinchine (P-K) mean value formula for the average number in the system:

$L = \rho + \frac{\lambda^2 E[S^2]}{2(1 - \rho)}$

The first term ( $\rho$ ) is the average number in service. The second term is the average number waiting in the queue, which depends on $E[S^2]$ . Higher variability in service times (larger $E[S^2]$ ) means longer queues, even if the mean service time stays the same.

You can also express the queue length using the coefficient of variation $C_s^2 = \text{Var}(S) / (E[S])^2$ :

$L_q = \frac{\rho^2(1 + C_s^2)}{2(1 - \rho)}$

Other performance measures follow from Little's law.

M/D/1 queue

The M/D/1 queue is a special case of M/G/1 where every service time equals exactly $1/\mu$ . Since the variance is zero ( $C_s^2 = 0$ ), the P-K formula simplifies:

$L = \rho + \frac{\rho^2}{2(1 - \rho)}$

$L_q = \frac{\rho^2}{2(1 - \rho)}$

Compare this to M/M/1, where $L_q = \rho^2 / (1 - \rho)$ . The M/D/1 queue length is exactly half that of M/M/1 at the same $\rho$ . This illustrates a general principle: reducing service time variability always improves performance.

Multi-server models

Multi-server models have $c \geq 2$ servers working in parallel. A single shared queue feeds all servers, and an arriving customer goes to any idle server (or waits if all are busy).

M/M/c queue

The M/M/c queue has Poisson arrivals (rate $\lambda$ ), exponential service (rate $\mu$ per server), and $c$ identical servers.

Stability condition: $\rho = \lambda / (c\mu) < 1$

The steady-state probabilities are:

$P_n = \begin{cases} \frac{(c\rho)^n}{n!} P_0, & 0 \leq n \leq c \\ \frac{(c\rho)^n}{c! \, c^{n-c}} P_0, & n > c \end{cases}$

where $P_0$ is determined by normalization.

Performance measures:

Avg. customers in queue: $L_q = \frac{P_0 (c\rho)^c \rho}{c!(1 - \rho)^2}$
Avg. customers in system: $L = L_q + c\rho$
Avg. time in queue: $W_q = L_q / \lambda$
Avg. time in system: $W = W_q + 1/\mu$

Note that $L = L_q + c\rho$ because $c\rho = \lambda/\mu$ is the average number of busy servers. When $c = 1$ , all formulas reduce to the M/M/1 case.

M/M/∞ queue

The M/M/∞ queue has infinitely many servers, so every arriving customer immediately begins service with no waiting.

Steady-state probabilities follow a Poisson distribution:

$P_n = e^{-\lambda/\mu} \frac{(\lambda/\mu)^n}{n!}$

Performance measures:

$L = \lambda / \mu$
$W = 1/\mu$
$L_q = 0$ and $W_q = 0$ (no waiting ever occurs)

This model is always stable regardless of $\rho$ . It's useful for modeling systems with ample capacity (large call centers, self-service systems) or as a building block in queueing network analysis.

Erlang loss system (M/M/c/c)

The Erlang loss system has $c$ servers and no waiting room. If all servers are busy, an arriving customer is turned away ("blocked" or "lost").

The key performance measure is the blocking probability $P_c$ , given by the Erlang B formula:

$P_c = \frac{(\lambda/\mu)^c / c!}{\sum_{n=0}^{c} (\lambda/\mu)^n / n!}$

This formula has a remarkable property called insensitivity: the blocking probability depends on the service time distribution only through its mean $1/\mu$ , not its shape. So the Erlang B formula works for general service times too, not just exponential.

Erlang loss systems are the classic model for telephone trunk lines, hospital beds, and any system where blocked customers simply go elsewhere.

Finite capacity queues

Finite capacity queues cap the total number of customers allowed in the system at $K$ (including those in service). When the system is full, new arrivals are blocked and lost.

M/M/1/K queue

The M/M/1/K queue is like M/M/1 but with a maximum of $K$ customers.

Steady-state probabilities:

$P_n = \frac{(1 - \rho)\rho^n}{1 - \rho^{K+1}}, \quad n = 0, 1, \ldots, K$

When $\rho = 1$ , this simplifies to $P_n = 1/(K+1)$ (uniform distribution).

Unlike the standard M/M/1 queue, the M/M/1/K queue has a valid steady state even when $\rho \geq 1$ , because the finite buffer prevents unbounded growth.

Blocking probability and throughput

Blocking probability is the probability that an arriving customer finds the system full:

$P_K = \frac{(1 - \rho)\rho^K}{1 - \rho^{K+1}}$

Effective arrival rate (throughput) accounts for blocked customers:

$\lambda_{\text{eff}} = \lambda(1 - P_K)$

Only customers who actually enter the system count toward throughput. When using Little's law for finite-capacity systems, you must use $\lambda_{\text{eff}}$ rather than $\lambda$ :

$L = \lambda_{\text{eff}} \cdot W$

This distinction between the offered arrival rate $\lambda$ and the effective rate $\lambda_{\text{eff}}$ is critical for getting correct performance measures in any system with blocking.