Fiveable

🔀Stochastic Processes Unit 8 Review

QR code for Stochastic Processes practice questions

8.3 M/M/1 and M/M/c queues

8.3 M/M/1 and M/M/c queues

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Basics of M/M/1 queues

An M/M/1 queue models a system with a single server where customers arrive randomly and service takes a random amount of time. The name itself encodes the assumptions: the first "M" means Markovian (memoryless) arrivals, the second "M" means Markovian service times, and the "1" means one server.

Arrival process

Customers arrive according to a Poisson process with rate λ\lambda (customers per unit time). This means:

  • Inter-arrival times are independent and exponentially distributed with mean 1/λ1/\lambda.
  • Arrivals are independent of each other and of the current state of the system.
  • In any small interval dtdt, the probability of exactly one arrival is approximately λdt\lambda \, dt.

Service time distribution

Each customer's service time is exponentially distributed with rate μ\mu, so the mean service time is 1/μ1/\mu. The memoryless property of the exponential distribution is what makes the analysis tractable: no matter how long a service has been in progress, the remaining time has the same distribution. This is precisely why the system's future behavior depends only on the current number of customers, making it a continuous-time Markov chain.

Traffic intensity

Traffic intensity is defined as:

ρ=λμ\rho = \frac{\lambda}{\mu}

This ratio captures how fast work arrives relative to how fast the server can handle it. For the queue to reach a steady state, you need ρ<1\rho < 1. If ρ1\rho \geq 1, the server can't keep up with arrivals and the queue grows without bound.

Think of ρ\rho as the long-run fraction of time the server is busy. At ρ=0.9\rho = 0.9, the server is idle only 10% of the time, and congestion becomes severe.

Steady-state probabilities

Once the system reaches steady state, the probability of finding exactly nn customers in the system is:

Pn=(1ρ)ρn,n=0,1,2,P_n = (1 - \rho)\,\rho^n, \quad n = 0, 1, 2, \ldots

This is a geometric distribution. A few things to note:

  • P0=1ρP_0 = 1 - \rho is the probability the system is empty (and the server is idle).
  • The probabilities decay exponentially in nn, so very long queues are possible but increasingly unlikely when ρ\rho is well below 1.
  • All performance measures can be derived from this distribution.

Performance measures of M/M/1 queues

These formulas let you quantify how congested a system is and how long customers wait. They all follow from the steady-state distribution above.

Server utilization

U=ρ=λμU = \rho = \frac{\lambda}{\mu}

This is the fraction of time the server is busy. For example, if λ=8\lambda = 8 customers/hour and μ=10\mu = 10 customers/hour, then ρ=0.8\rho = 0.8, meaning the server is busy 80% of the time.

Expected number in system

L=ρ1ρ=λμλL = \frac{\rho}{1 - \rho} = \frac{\lambda}{\mu - \lambda}

This counts everyone present: the customer in service plus those waiting. With ρ=0.8\rho = 0.8, you get L=0.8/0.2=4L = 0.8/0.2 = 4 customers on average. Notice how LL grows rapidly as ρ1\rho \to 1.

Expected number in queue

Lq=ρ21ρ=λ2μ(μλ)L_q = \frac{\rho^2}{1 - \rho} = \frac{\lambda^2}{\mu(\mu - \lambda)}

This counts only the customers waiting (not the one being served). The relationship Lq=LρL_q = L - \rho is worth remembering.

Expected waiting time in system

W=1μλW = \frac{1}{\mu - \lambda}

This is the total time from arrival to departure (waiting plus service). With λ=8\lambda = 8 and μ=10\mu = 10, W=1/2W = 1/2 hour = 30 minutes.

Expected waiting time in queue

Wq=λμ(μλ)W_q = \frac{\lambda}{\mu(\mu - \lambda)}

This is the time spent waiting before service begins. Note that W=Wq+1/μW = W_q + 1/\mu, which makes sense: total time in system equals time waiting plus time being served.

Little's Law connects these measures: L=λWL = \lambda W and Lq=λWqL_q = \lambda W_q. These relationships hold for any stable queue, not just M/M/1. They're extremely useful for converting between "number" measures and "time" measures.

Variations of M/M/1 queues

The basic M/M/1 model assumes infinite capacity and an infinite customer population. Several variations relax these assumptions.

Arrival process, Poisson distribution - Wikipedia

M/M/1/K queues with limited capacity

The system can hold at most KK customers (including the one in service). When the system is full, new arrivals are turned away and lost. Because arrivals are blocked at capacity, this system can be stable even when ρ1\rho \geq 1. The steady-state probabilities become:

Pn=(1ρ)ρn1ρK+1,n=0,1,,KP_n = \frac{(1-\rho)\,\rho^n}{1 - \rho^{K+1}}, \quad n = 0, 1, \ldots, K

(When ρ=1\rho = 1, each state is equally likely: Pn=1/(K+1)P_n = 1/(K+1).)

M/M/1/K/K queues with limited population

Here the total customer population is finite (KK customers). A customer who is already in the system can't generate another arrival. This is useful for modeling machine repair problems: you have KK machines, and only machines that are running can break down. The effective arrival rate depends on how many customers are already in the system, which changes the balance equations.

M/M/1 queues with balking

Customers who arrive and see a long queue may decide not to join at all. Typically, the probability of joining decreases as the number of customers already present increases. This reduces the effective arrival rate and lowers congestion compared to the standard model.

M/M/1 queues with reneging

Customers join the queue but leave if they wait too long. Each customer's patience is usually modeled as an exponential random variable. Reneging also reduces the effective load on the system but requires modified balance equations to compute steady-state probabilities.

Basics of M/M/c queues

An M/M/c queue extends the M/M/1 model to cc identical servers working in parallel, all fed by a single queue. Customers are served first-come, first-served by whichever server becomes free next.

Arrival and service time distributions

The assumptions are the same as M/M/1:

  • Arrivals follow a Poisson process with rate λ\lambda.
  • Each server has independent, exponentially distributed service times with rate μ\mu.

The combined maximum service rate of the system is cμc\mu.

Traffic intensity for M/M/c

ρ=λcμ\rho = \frac{\lambda}{c\mu}

This is the utilization per server. Stability again requires ρ<1\rho < 1, meaning λ<cμ\lambda < c\mu. It's common to also define a=λ/μa = \lambda/\mu (the offered load in Erlangs), so ρ=a/c\rho = a/c.

Steady-state probabilities for M/M/c

The probability of an empty system is:

P0=[n=0c1ann!+acc!(1ρ)]1P_0 = \left[\sum_{n=0}^{c-1}\frac{a^n}{n!} + \frac{a^c}{c!(1-\rho)}\right]^{-1}

where a=λ/μ=cρa = \lambda/\mu = c\rho. The state probabilities are then:

  • For 0n<c0 \leq n < c: Pn=ann!P0\quad P_n = \frac{a^n}{n!}\,P_0
  • For ncn \geq c: Pn=anc!cncP0=acc!ρncP0\quad P_n = \frac{a^n}{c!\,c^{n-c}}\,P_0 = \frac{a^c}{c!}\,\rho^{n-c}\,P_0

The key difference from M/M/1: when n<cn < c, some servers are idle and the effective service rate is nμn\mu (not μ\mu). When ncn \geq c, all servers are busy and the effective service rate is cμc\mu.

Performance measures of M/M/c queues

Probability of waiting (Erlang C formula)

The probability that an arriving customer finds all servers busy and must wait is:

Pw=C(c,a)=acc!(1ρ)P0P_w = C(c, a) = \frac{a^c}{c!(1-\rho)} \cdot P_0

This is the Erlang C formula, one of the most widely used results in queueing theory. Call centers, for instance, use it daily to staff agents.

Expected number in queue

Lq=ρPw1ρL_q = \frac{\rho\,P_w}{1 - \rho}

Arrival process, The Exponential Distribution · Statistics

Expected number in system

L=Lq+a=Lq+λμL = L_q + a = L_q + \frac{\lambda}{\mu}

The a=λ/μa = \lambda/\mu term accounts for the average number of customers currently in service.

Expected waiting time in queue

By Little's Law:

Wq=LqλW_q = \frac{L_q}{\lambda}

Expected waiting time in system

W=Wq+1μW = W_q + \frac{1}{\mu}

Numerical example: Suppose λ=20\lambda = 20 customers/hour, μ=8\mu = 8 customers/hour, and c=3c = 3 servers. Then a=20/8=2.5a = 20/8 = 2.5 and ρ=2.5/30.833\rho = 2.5/3 \approx 0.833. You'd compute P0P_0, then PwP_w, and from there all the performance measures follow. Compare this to an M/M/1 queue with μ=24\mu = 24 (same total service capacity): the M/M/c system will generally have shorter queues because pooling servers is more efficient than having one fast server.

Designing M/M/c queueing systems

Determining the number of servers needed

The typical approach:

  1. Estimate λ\lambda and μ\mu from data.
  2. Pick a performance target (e.g., "average wait in queue under 2 minutes" or "at most 20% of customers wait").
  3. Compute LqL_q, WqW_q, or PwP_w for c=1,2,3,c = 1, 2, 3, \ldots until the target is met.
  4. The smallest cc satisfying the target is your answer.

Since the formulas involve factorials and sums, this is typically done computationally or with Erlang C tables.

Cost analysis and optimization

A common formulation balances two costs:

  • Server cost: cCsc \cdot C_s per unit time (wages, equipment).
  • Waiting cost: LCwL \cdot C_w per unit time (customer dissatisfaction, lost productivity).

The total cost TC=cCs+LCwTC = c \cdot C_s + L \cdot C_w is evaluated for different values of cc, and you pick the cc that minimizes it. Because LL decreases with cc but server cost increases, there's usually a clear optimum.

Quality of service requirements

In practice, systems are designed around service level agreements (SLAs). Common targets include:

  • A specified percentage of customers served within a target time (e.g., 80% of calls answered within 20 seconds).
  • Maximum acceptable probability of waiting.
  • Maximum acceptable average wait.

The M/M/c model provides the formulas to check whether a given configuration meets these targets.

Capacity planning considerations

Arrival rates fluctuate over time (think of a call center with morning peaks). Capacity planning involves:

  • Forecasting demand at different times of day or seasons.
  • Sizing the system for peak loads or using time-varying staffing.
  • Building in headroom so that small demand increases don't push the system past ρ<1\rho < 1.

M/M/1 vs M/M/c queues

Differences in system behavior

The fundamental difference is the number of servers. In M/M/1, a single server handles everything. In M/M/c, cc servers share the load from one queue. This pooling effect is powerful: combining resources into a shared pool is more efficient than running separate single-server queues.

For example, two separate M/M/1 queues each with λ=4\lambda = 4 and μ=5\mu = 5 (ρ=0.8\rho = 0.8 each) will have L=4L = 4 per queue, so Ltotal=8L_{total} = 8. A single M/M/2 queue with λ=8\lambda = 8 and μ=5\mu = 5 has ρ=0.8\rho = 0.8 but LL will be significantly less than 8 due to the pooling effect.

Performance comparison

For the same total service capacity (cμc\mu equal in both cases):

  • M/M/c queues have shorter expected queue lengths (LqL_q).
  • M/M/c queues have shorter expected waiting times (WqW_q).
  • The probability of waiting (PwP_w) is lower in M/M/c.

The advantage of pooling grows as utilization increases. At low ρ\rho, the difference is small. At high ρ\rho, pooling makes a dramatic difference.

Suitability for different applications

  • M/M/1 fits systems with a single resource: a single checkout lane, a single-processor system, a single communication link.
  • M/M/c fits systems with parallel resources: a bank with multiple tellers, a hospital emergency department, a multi-core server.

Transitioning from M/M/1 to M/M/c

When demand grows beyond what one server can handle (ρ\rho approaching 1), adding servers is the natural response. The decision should be based on:

  • Current performance metrics and how close ρ\rho is to 1.
  • Projected demand growth.
  • The cost of additional servers versus the cost of degraded service.

Adding even one server (going from M/M/1 to M/M/2) can produce a large improvement in waiting times when the system is heavily loaded.