Fiveable

🔀Stochastic Processes Unit 8 Review

QR code for Stochastic Processes practice questions

8.1 Basic queueing models

8.1 Basic queueing models

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Queueing models provide the mathematical framework for analyzing systems where customers arrive, wait for service, and depart. They let you predict performance metrics and make informed decisions about capacity planning and resource allocation.

This section covers the building blocks: arrival processes, service time distributions, standard notation, birth-death process analysis, and the core single-server and multi-server models.

Arrival processes in queueing models

Arrival processes describe the pattern and rate at which customers (or jobs, packets, calls, etc.) enter a queueing system. Choosing the right arrival model is the first step in any queueing analysis, because everything downstream depends on it.

Poisson process for arrivals

The Poisson process is the default model for arrivals in most basic queueing systems. It assumes:

  • Arrivals occur independently of one another
  • Arrivals happen at a constant average rate λ\lambda (arrivals per unit time)
  • Inter-arrival times follow an exponential distribution with parameter λ\lambda

The exponential distribution has the memoryless property: the time until the next arrival doesn't depend on how long it's been since the last one. Mathematically, P(T>t+sT>s)=P(T>t)P(T > t + s \mid T > s) = P(T > t).

Poisson arrivals work well when customers arrive randomly and independently, such as at call centers, web servers, or emergency rooms.

Batch arrivals

Sometimes customers arrive in groups rather than one at a time. Think of families arriving at a restaurant or a batch of jobs submitted to a computing cluster.

  • The batch sizes can be fixed or random (following a geometric, Poisson, or other distribution)
  • Batch arrival processes are typically modeled as compound Poisson processes: batches arrive according to a Poisson process, and each batch has a random size
  • Analysis requires tracking both the batch arrival rate and the batch size distribution, since both affect congestion

Time-dependent arrival rates

In many real systems, the arrival rate changes over time. Rush-hour traffic, peak call center hours, and seasonal demand all exhibit this behavior.

  • Modeled using a non-homogeneous Poisson process, where the arrival rate λ(t)\lambda(t) is a function of time
  • The expected number of arrivals in an interval [t1,t2][t_1, t_2] is t1t2λ(t)dt\int_{t_1}^{t_2} \lambda(t) \, dt
  • These systems are harder to analyze because steady-state results may not apply directly; time-varying simulations or piecewise-stationary approximations are common approaches

Service time distributions

Service time distributions describe how long it takes to serve each customer. The choice of distribution shapes the entire queueing model and determines which analytical tools you can use.

Exponential service times

Exponential service times (parameter μ\mu, so the mean service time is 1/μ1/\mu) are the most common assumption in basic models. The key reason: the memoryless property makes the math tractable.

  • The remaining service time doesn't depend on how long service has already been going on
  • This fits situations where service durations are highly variable and unpredictable
  • Models with exponential service times (M/M/1, M/M/c) yield clean, closed-form solutions

General service time distributions

When exponential service times aren't realistic, you can use a general distribution, denoted G in Kendall's notation.

  • The service time can follow any distribution: Erlang, hyperexponential, lognormal, Weibull, etc.
  • General distributions can capture features like low variability (Erlang), high variability (hyperexponential), or heavy tails (lognormal)
  • The M/G/1 model handles Poisson arrivals with general service times, analyzed via the Pollaczek-Khinchine formula
  • Beyond M/G/1, approximations and numerical methods are often necessary

Deterministic service times

Deterministic service times mean every customer takes exactly the same amount of time: D=1/μD = 1/\mu.

  • Applicable to automated systems, assembly lines, or any process with negligible variation
  • The M/D/1 queue has lower average waiting times than M/M/1 at the same traffic intensity, because removing service time variability always helps
  • This is a special case of M/G/1 where the variance of service time is zero

Notation and terminology

Kendall's notation

Kendall's notation is a compact way to specify a queueing model. The full form is A/S/c/K/N/D:

SymbolMeaningCommon values
AArrival processM (Poisson), G (general), D (deterministic)
SService time distributionM (exponential), G (general), D (deterministic)
cNumber of servers1, 2, ..., c
KSystem capacityFinite integer, or omitted if infinite
NCalling population sizeOmitted if infinite
DQueue disciplineFCFS, LCFS, priority; omitted if FCFS

When the last three parameters are omitted, the defaults are K=K = \infty, N=N = \infty, and FCFS discipline. So "M/M/1" means Poisson arrivals, exponential service, one server, infinite capacity, infinite population, FCFS.

Traffic intensity and stability

Traffic intensity ρ\rho measures how heavily loaded the system is:

ρ=λcμ\rho = \frac{\lambda}{c\mu}

where λ\lambda is the arrival rate, μ\mu is the per-server service rate, and cc is the number of servers.

  • If ρ<1\rho < 1, the system is stable: the queue won't grow without bound
  • If ρ1\rho \geq 1, the system is unstable: arrivals come in faster than they can be served, and the queue grows indefinitely

For finite-capacity queues (like M/M/1/K), the system doesn't blow up even when ρ1\rho \geq 1, because excess arrivals are simply blocked. But for infinite-capacity queues, ρ<1\rho < 1 is a hard requirement for steady-state analysis.

Little's law

Little's law is one of the most powerful results in queueing theory:

L=λWL = \lambda W

  • LL = average number of customers in the system
  • λ\lambda = average arrival rate (for finite-capacity systems, use the effective arrival rate)
  • WW = average time a customer spends in the system

This holds for any stable queueing system, regardless of the arrival process, service distribution, number of servers, or queue discipline. It also applies to subsystems: Lq=λWqL_q = \lambda W_q relates the average queue length to the average waiting time in the queue.

If you know any two of the three quantities, you can find the third. This makes Little's law extremely useful for quick calculations and sanity checks.

Birth-death processes

Birth-death processes are continuous-time Markov chains where the state represents the number of customers in the system. Transitions only happen between adjacent states: the state increases by 1 (a "birth," i.e., an arrival) or decreases by 1 (a "death," i.e., a service completion).

Balance equations

To find the steady-state distribution, you set up balance equations that equate the rate of flow into each state with the rate of flow out.

For a birth-death process with states {0,1,2,}\{0, 1, 2, \ldots\}, birth rates λn\lambda_n, and death rates μn\mu_n:

  1. State 0: λ0P0=μ1P1\lambda_0 P_0 = \mu_1 P_1
  2. State n (for n1n \geq 1): (λn+μn)Pn=λn1Pn1+μn+1Pn+1(\lambda_n + \mu_n) P_n = \lambda_{n-1} P_{n-1} + \mu_{n+1} P_{n+1}

A cleaner approach uses the detailed balance (level-crossing) equations. Equating flow across the boundary between states nn and n+1n+1:

λnPn=μn+1Pn+1,n=0,1,2,\lambda_n P_n = \mu_{n+1} P_{n+1}, \quad n = 0, 1, 2, \ldots

This gives a recursive formula: Pn+1=λnμn+1PnP_{n+1} = \frac{\lambda_n}{\mu_{n+1}} P_n, which you solve iteratively starting from P0P_0.

Poisson process for arrivals, Probability distribution - wikidoc

Steady-state probabilities

Iterating the recursion yields:

Pn=P0k=0n1λkμk+1P_n = P_0 \prod_{k=0}^{n-1} \frac{\lambda_k}{\mu_{k+1}}

You then determine P0P_0 from the normalization condition:

n=0Pn=1\sum_{n=0}^{\infty} P_n = 1

These probabilities tell you the long-run fraction of time the system has exactly nn customers. They exist only if the sum converges, which ties back to the stability condition.

Performance measures

Once you have the steady-state probabilities, you can compute all the standard performance measures:

  • LL (avg. customers in system): L=n=0nPnL = \sum_{n=0}^{\infty} n \, P_n
  • LqL_q (avg. customers in queue): Lq=n=c(nc)PnL_q = \sum_{n=c}^{\infty} (n - c) \, P_n for a cc-server system
  • WW (avg. time in system): from Little's law, W=L/λW = L / \lambda
  • WqW_q (avg. waiting time in queue): Wq=Lq/λW_q = L_q / \lambda
  • P0P_0 (probability system is empty)
  • PKP_K (blocking probability, for finite-capacity systems)

These measures drive practical decisions about staffing, buffer sizing, and service level agreements.

Single-server models

Single-server models are the simplest queueing systems and the foundation for everything more complex. One server processes customers one at a time, with a queue forming when the server is busy.

M/M/1 queue

The M/M/1 queue has Poisson arrivals (rate λ\lambda), exponential service (rate μ\mu), and one server. It's the most studied queueing model.

Stability condition: ρ=λ/μ<1\rho = \lambda / \mu < 1

Steady-state probabilities: Pn=(1ρ)ρn,n=0,1,2,P_n = (1 - \rho)\rho^n, \quad n = 0, 1, 2, \ldots

This is a geometric distribution. The probability of finding nn customers in the system decays exponentially with nn.

Performance measures:

MeasureFormula
Avg. customers in system (LL)ρ1ρ\frac{\rho}{1 - \rho}
Avg. customers in queue (LqL_q)ρ21ρ\frac{\rho^2}{1 - \rho}
Avg. time in system (WW)1μλ\frac{1}{\mu - \lambda}
Avg. waiting time in queue (WqW_q)ρμλ\frac{\rho}{\mu - \lambda}
Notice how all these measures blow up as ρ1\rho \to 1. Even at ρ=0.9\rho = 0.9, the average queue length is 0.81/0.1=8.10.81/0.1 = 8.1 customers. At ρ=0.5\rho = 0.5, it's only 0.25/0.5=0.50.25/0.5 = 0.5. This nonlinear sensitivity to utilization is one of the most important takeaways from queueing theory.

M/G/1 queue

The M/G/1 queue generalizes M/M/1 by allowing any service time distribution. Arrivals are still Poisson (rate λ\lambda), but the service time SS can have any distribution with mean E[S]=1/μE[S] = 1/\mu and second moment E[S2]E[S^2].

The key result is the Pollaczek-Khinchine (P-K) mean value formula for the average number in the system:

L=ρ+λ2E[S2]2(1ρ)L = \rho + \frac{\lambda^2 E[S^2]}{2(1 - \rho)}

The first term (ρ\rho) is the average number in service. The second term is the average number waiting in the queue, which depends on E[S2]E[S^2]. Higher variability in service times (larger E[S2]E[S^2]) means longer queues, even if the mean service time stays the same.

You can also express the queue length using the coefficient of variation Cs2=Var(S)/(E[S])2C_s^2 = \text{Var}(S) / (E[S])^2:

Lq=ρ2(1+Cs2)2(1ρ)L_q = \frac{\rho^2(1 + C_s^2)}{2(1 - \rho)}

Other performance measures follow from Little's law.

M/D/1 queue

The M/D/1 queue is a special case of M/G/1 where every service time equals exactly 1/μ1/\mu. Since the variance is zero (Cs2=0C_s^2 = 0), the P-K formula simplifies:

L=ρ+ρ22(1ρ)L = \rho + \frac{\rho^2}{2(1 - \rho)}

Lq=ρ22(1ρ)L_q = \frac{\rho^2}{2(1 - \rho)}

Compare this to M/M/1, where Lq=ρ2/(1ρ)L_q = \rho^2 / (1 - \rho). The M/D/1 queue length is exactly half that of M/M/1 at the same ρ\rho. This illustrates a general principle: reducing service time variability always improves performance.

Multi-server models

Multi-server models have c2c \geq 2 servers working in parallel. A single shared queue feeds all servers, and an arriving customer goes to any idle server (or waits if all are busy).

M/M/c queue

The M/M/c queue has Poisson arrivals (rate λ\lambda), exponential service (rate μ\mu per server), and cc identical servers.

Stability condition: ρ=λ/(cμ)<1\rho = \lambda / (c\mu) < 1

The steady-state probabilities are:

Pn={(cρ)nn!P0,0nc(cρ)nc!cncP0,n>cP_n = \begin{cases} \frac{(c\rho)^n}{n!} P_0, & 0 \leq n \leq c \\ \frac{(c\rho)^n}{c! \, c^{n-c}} P_0, & n > c \end{cases}

where P0P_0 is determined by normalization.

Performance measures:

  • Avg. customers in queue: Lq=P0(cρ)cρc!(1ρ)2L_q = \frac{P_0 (c\rho)^c \rho}{c!(1 - \rho)^2}
  • Avg. customers in system: L=Lq+cρL = L_q + c\rho
  • Avg. time in queue: Wq=Lq/λW_q = L_q / \lambda
  • Avg. time in system: W=Wq+1/μW = W_q + 1/\mu

Note that L=Lq+cρL = L_q + c\rho because cρ=λ/μc\rho = \lambda/\mu is the average number of busy servers. When c=1c = 1, all formulas reduce to the M/M/1 case.

M/M/∞ queue

The M/M/∞ queue has infinitely many servers, so every arriving customer immediately begins service with no waiting.

Steady-state probabilities follow a Poisson distribution:

Pn=eλ/μ(λ/μ)nn!P_n = e^{-\lambda/\mu} \frac{(\lambda/\mu)^n}{n!}

Performance measures:

  • L=λ/μL = \lambda / \mu
  • W=1/μW = 1/\mu
  • Lq=0L_q = 0 and Wq=0W_q = 0 (no waiting ever occurs)

This model is always stable regardless of ρ\rho. It's useful for modeling systems with ample capacity (large call centers, self-service systems) or as a building block in queueing network analysis.

Erlang loss system (M/M/c/c)

The Erlang loss system has cc servers and no waiting room. If all servers are busy, an arriving customer is turned away ("blocked" or "lost").

The key performance measure is the blocking probability PcP_c, given by the Erlang B formula:

Pc=(λ/μ)c/c!n=0c(λ/μ)n/n!P_c = \frac{(\lambda/\mu)^c / c!}{\sum_{n=0}^{c} (\lambda/\mu)^n / n!}

This formula has a remarkable property called insensitivity: the blocking probability depends on the service time distribution only through its mean 1/μ1/\mu, not its shape. So the Erlang B formula works for general service times too, not just exponential.

Erlang loss systems are the classic model for telephone trunk lines, hospital beds, and any system where blocked customers simply go elsewhere.

Finite capacity queues

Finite capacity queues cap the total number of customers allowed in the system at KK (including those in service). When the system is full, new arrivals are blocked and lost.

M/M/1/K queue

The M/M/1/K queue is like M/M/1 but with a maximum of KK customers.

Steady-state probabilities:

Pn=(1ρ)ρn1ρK+1,n=0,1,,KP_n = \frac{(1 - \rho)\rho^n}{1 - \rho^{K+1}}, \quad n = 0, 1, \ldots, K

When ρ=1\rho = 1, this simplifies to Pn=1/(K+1)P_n = 1/(K+1) (uniform distribution).

Unlike the standard M/M/1 queue, the M/M/1/K queue has a valid steady state even when ρ1\rho \geq 1, because the finite buffer prevents unbounded growth.

Blocking probability and throughput

Blocking probability is the probability that an arriving customer finds the system full:

PK=(1ρ)ρK1ρK+1P_K = \frac{(1 - \rho)\rho^K}{1 - \rho^{K+1}}

Effective arrival rate (throughput) accounts for blocked customers:

λeff=λ(1PK)\lambda_{\text{eff}} = \lambda(1 - P_K)

Only customers who actually enter the system count toward throughput. When using Little's law for finite-capacity systems, you must use λeff\lambda_{\text{eff}} rather than λ\lambda:

L=λeffWL = \lambda_{\text{eff}} \cdot W

This distinction between the offered arrival rate λ\lambda and the effective rate λeff\lambda_{\text{eff}} is critical for getting correct performance measures in any system with blocking.