Fiveable

🏭Intro to Industrial Engineering Unit 3 Review

QR code for Intro to Industrial Engineering practice questions

3.2 Single-Server and Multi-Server Models

3.2 Single-Server and Multi-Server Models

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🏭Intro to Industrial Engineering
Unit & Topic Study Guides

Single vs Multi-Server Queuing Models

Fundamental Differences and Applications

Queuing theory gives you the math to predict how lines form and how long people wait. The two foundational models are M/M/1 (single-server) and M/M/c (multi-server), and understanding both is essential for designing efficient service systems.

In the notation M/M/1 or M/M/c, each letter means something specific. The first M stands for Markovian (exponential) interarrival times. The second M means Markovian (exponential) service times. The number at the end tells you how many servers are working in parallel: 1 for a single server, c for multiple servers.

Single-server models apply to simple setups like a single checkout counter or a one-window permit office. Multi-server models represent systems where multiple workers handle the same queue in parallel, like a call center with several agents or a bank with multiple tellers.

The utilization factor is calculated differently for each:

  • Single-server: ρ=λ/μ\rho = \lambda / \mu
  • Multi-server: ρ=λ/(cμ)\rho = \lambda / (c\mu)

where λ\lambda is the arrival rate, μ\mu is the service rate per server, and cc is the number of servers. In both cases, ρ\rho represents the fraction of available capacity being used.

Performance Measure Comparisons

The formulas for key performance measures are simpler in M/M/1 and more involved in M/M/c:

  • Average number in the system (LL):
    • M/M/1: L=λ/(μλ)L = \lambda / (\mu - \lambda)
    • M/M/c: Requires the Erlang C function (covered below)
  • Average waiting time in queue (WqW_q):
    • M/M/1: Wq=ρ/(μλ)W_q = \rho / (\mu - \lambda)
    • M/M/c: Also depends on the Erlang C function
  • Probability of zero customers (P0P_0):
    • M/M/1: P0=1ρP_0 = 1 - \rho
    • M/M/c: A more complex expression involving summation and factorial terms

Stability conditions also differ. An M/M/1 system is stable when ρ<1\rho < 1, meaning arrivals must be slower than service. An M/M/c system is stable when λ<cμ\lambda < c\mu, which is equivalent to requiring ρ<1\rho < 1 using the multi-server definition. If these conditions aren't met, the queue grows without bound.

Applying M/M/1 and M/M/c Models

M/M/1 Model Application

To use the M/M/1 model, you only need two inputs: the arrival rate λ\lambda and the service rate μ\mu. From those, you can derive every performance measure.

Core M/M/1 formulas:

  • Utilization factor: ρ=λ/μ\rho = \lambda / \mu
  • Average number in the system: L=λ/(μλ)L = \lambda / (\mu - \lambda)
  • Average number in the queue: Lq=ρ2/(1ρ)L_q = \rho^2 / (1 - \rho)
  • Average time in the system: W=1/(μλ)W = 1 / (\mu - \lambda)
  • Average waiting time in the queue: Wq=ρ/(μλ)W_q = \rho / (\mu - \lambda)
  • Probability of nn customers in the system: Pn=(1ρ)ρnP_n = (1 - \rho)\rho^n
  • Probability of waiting (system is busy): Pw=ρP_w = \rho

Notice that LL, WW, LqL_q, and WqW_q are all connected through Little's Law: L=λWL = \lambda W and Lq=λWqL_q = \lambda W_q. If you know any one of these measures plus λ\lambda, you can find the others.

Worked example: A single-server coffee shop has customers arriving every 5 minutes (λ=0.2\lambda = 0.2 per min) and an average service time of 4 minutes (μ=0.25\mu = 0.25 per min).

  1. Calculate utilization: ρ=0.2/0.25=0.8\rho = 0.2 / 0.25 = 0.8

  2. Average number in system: L=0.2/(0.250.2)=4L = 0.2 / (0.25 - 0.2) = 4 customers

  3. Average time in system: W=1/(0.250.2)=20W = 1 / (0.25 - 0.2) = 20 minutes

  4. Average wait in queue: Wq=0.8/(0.250.2)=16W_q = 0.8 / (0.25 - 0.2) = 16 minutes

  5. Average number in queue: Lq=0.82/(10.8)=0.64/0.2=3.2L_q = 0.8^2 / (1 - 0.8) = 0.64 / 0.2 = 3.2 customers

That 80% utilization rate means the server is busy most of the time, and customers wait an average of 16 minutes just in line before being served. The remaining 4 minutes (WWq=2016W - W_q = 20 - 16) is the actual service time.

M/M/c Model Application

The M/M/c model adds a third input: the number of servers cc. The formulas are more complex because they rely on the Erlang C formula, which gives the probability that an arriving customer has to wait (i.e., all servers are busy).

Erlang C formula:

C(c,ρ)=(cρ)cc!(1ρ)/[n=0c1(cρ)nn!+(cρ)cc!(1ρ)]C(c, \rho) = \frac{(c\rho)^c}{c!(1-\rho)} \bigg/ \left[\sum_{n=0}^{c-1} \frac{(c\rho)^n}{n!} + \frac{(c\rho)^c}{c!(1-\rho)}\right]

Here, ρ=λ/(cμ)\rho = \lambda / (c\mu) is the per-server utilization. The numerator captures the probability of all cc servers being busy, and the denominator normalizes across all possible system states.

Key M/M/c formulas:

  • Probability of waiting: Pw=C(c,ρ)P_w = C(c, \rho)
  • Average number in queue: Lq=C(c,ρ)ρ1ρL_q = \frac{C(c, \rho) \cdot \rho}{1 - \rho}
  • Average waiting time in queue: Wq=C(c,ρ)cμ(1ρ)W_q = \frac{C(c, \rho)}{c\mu(1 - \rho)}
  • Average time in system: W=Wq+1/μW = W_q + 1/\mu
  • Average number in system: L=Lq+λ/μL = L_q + \lambda / \mu

Little's Law still applies here: L=λWL = \lambda W and Lq=λWqL_q = \lambda W_q.

Worked example: A call center has 5 agents (c=5c = 5), calls arriving every 2 minutes (λ=0.5\lambda = 0.5 per min), and an average call duration of 8 minutes (μ=0.125\mu = 0.125 per min).

  1. Calculate utilization: ρ=0.5/(5×0.125)=0.8\rho = 0.5 / (5 \times 0.125) = 0.8
  2. Compute C(c,ρ)C(c, \rho) using the Erlang C formula (this is typically done with software or tables)
  3. Use C(c,ρ)C(c, \rho) to find WqW_q, LqL_q, and other measures

The utilization is 0.8 in both this and the coffee shop example, but the multi-server system distributes that load across 5 agents. The Erlang C calculation is tedious by hand, so in practice you'll use a calculator, spreadsheet, or lookup table.

Model Assumptions and Limitations

Both M/M/1 and M/M/c share the same core assumptions:

  • Poisson arrivals: Customers arrive randomly at a constant average rate
  • Exponential service times: Service durations follow an exponential distribution (the memoryless property means the probability of finishing service doesn't depend on how long service has already taken)
  • FCFS discipline: First-come, first-served
  • Unlimited queue capacity: The line can grow infinitely long
  • No balking or reneging: Customers never refuse to join the line (balking) or leave after joining (reneging)
  • Infinite calling population: The arrival rate doesn't change based on how many customers are already in the system

These assumptions rarely hold perfectly in practice. Real customers do leave long lines, queue capacity is often physically limited, and arrival rates can fluctuate throughout the day. When these assumptions are significantly violated, you'll need more advanced models (like M/M/c/K for finite queue capacity or M/G/1 for non-exponential service times). Sensitivity analysis helps you gauge how much your results might shift when real conditions deviate from these assumptions.

System Parameters Impact on Queuing Performance

Fundamental Differences and Applications, International Journal of Physical Sciences - performance evaluation of law enforcement agency on ...

Utilization Factor and System Stability

The utilization factor ρ\rho is the single most important parameter in queuing. As ρ\rho approaches 1, performance degrades rapidly and non-linearly.

Here's a concrete illustration for an M/M/1 system (using Lq=ρ2/(1ρ)L_q = \rho^2 / (1 - \rho)):

ρ\rhoLqL_q (avg customers in queue)
0.50.5
0.71.63
0.98.1
0.9518.05

The jump from ρ=0.5\rho = 0.5 to ρ=0.9\rho = 0.9 increases the average queue length by a factor of 16. This non-linear relationship is one of the most important takeaways from queuing theory: a system that's "only" 90% utilized can have extremely long waits.

As ρ\rho nears 1, the probability of waiting approaches 1 and average waiting time grows toward infinity. This is why real systems should never be designed to operate near 100% utilization. You need slack capacity to absorb the natural randomness in arrivals and service times.

Arrival and Service Rates

Arrival rate λ\lambda and service rate μ\mu have opposite effects on performance:

  • Increasing λ\lambda (more customers per hour) raises ρ\rho, leading to longer queues, longer waits, and higher utilization
  • Increasing μ\mu (faster service) lowers ρ\rho, leading to shorter queues, shorter waits, and lower utilization

There's an important trade-off here: pushing service speed higher may reduce service quality. A barista who rushes every order will have shorter queues but possibly unhappy customers.

Example: In an M/M/c system with 3 servers and μ=8\mu = 8 customers/hour/server, doubling λ\lambda from 10 to 20 customers/hour increases WqW_q from about 0.05 hours (3 minutes) to 0.33 hours (20 minutes). That's roughly a 6x increase in wait time from a 2x increase in arrivals. The non-linearity strikes again.

Number of Servers and System Variability

Adding servers in an M/M/c model improves performance, but with diminishing returns. Each additional server helps less than the one before it.

Example: With λ=20\lambda = 20 customers/hour and μ=8\mu = 8 customers/hour/server:

Servers (cc)WqW_q (hours)
30.33
40.08
50.02

Going from 3 to 4 servers cuts wait time by about 75%. Going from 4 to 5 cuts it by another 75%, but the absolute improvement is much smaller (0.25 hours vs. 0.06 hours). At some point, adding more servers costs more than the wait time it saves.

Higher variability in arrival or service times also degrades performance beyond what M/M/1 and M/M/c predict. The exponential distribution already has a relatively high coefficient of variation (equal to 1). If your real system has even more variable service times (some customers take 2 minutes, others take 30), the standard models will underestimate actual wait times.

Optimal Server Number in Multi-Server Systems

Economic Analysis and Cost Functions

The goal is to find the number of servers that minimizes total cost, which combines two competing expenses:

  • Server costs (wages, equipment): increase as you add servers
  • Waiting costs (customer dissatisfaction, lost sales): decrease as you add servers

The total cost function is:

TC(c)=cCs+λWq(c)CwTC(c) = c \cdot C_s + \lambda \cdot W_q(c) \cdot C_w

where:

  • cc = number of servers
  • CsC_s = cost per server per unit time
  • CwC_w = waiting cost per customer per unit time
  • λ\lambda = arrival rate
  • Wq(c)W_q(c) = average waiting time in queue (which depends on cc)

The first term (cCsc \cdot C_s) is the total server cost. The second term (λWq(c)Cw\lambda \cdot W_q(c) \cdot C_w) is the total waiting cost: λWq(c)\lambda \cdot W_q(c) gives you LqL_q (by Little's Law), and multiplying by CwC_w converts that into a dollar cost. The optimal cc sits at the minimum of this total cost curve.

Optimization Techniques

Since cc must be a positive integer, you can find the optimum through marginal analysis:

  1. Start with the minimum feasible number of servers (the smallest cc where λ<cμ\lambda < c\mu, i.e., ρ<1\rho < 1)
  2. Calculate TC(c)TC(c)
  3. Calculate TC(c+1)TC(c+1)
  4. If TC(c+1)<TC(c)TC(c+1) < TC(c), increase cc by 1 and repeat from step 2
  5. Stop when TC(c+1)TC(c)TC(c+1) \geq TC(c). The current cc is your optimum

The total cost curve is typically convex (U-shaped), so once the cost starts increasing, you've passed the minimum.

You can also use a graphical method: plot TCTC against cc and visually identify the lowest point. For more complex scenarios with additional constraints (budget limits, space restrictions), integer programming techniques may be needed.

Example: A retail store has Cs=$20C_s = \$20/hour, Cw=$15C_w = \$15/hour, λ=30\lambda = 30 customers/hour, and μ=10\mu = 10 customers/hour/server. Since λ/μ=3\lambda / \mu = 3, you need at least 4 servers for stability. You'd calculate TCTC for c=4,5,6,...c = 4, 5, 6, ... and pick the cc with the lowest total cost.

Sensitivity Analysis and Practical Considerations

Your optimal solution depends on cost estimates that may be uncertain. Waiting cost CwC_w in particular is hard to pin down precisely. Sensitivity analysis tests how robust your answer is by varying the inputs:

  • One-way analysis: Change one parameter (say CwC_w) while holding everything else constant. For instance, if CwC_w increases from $15/hour to $25/hour, the optimal number of servers may jump from 4 to 5 because customer waiting becomes more expensive.
  • Two-way analysis: Vary two parameters simultaneously to see how they interact (e.g., what happens if both λ\lambda and CwC_w increase?).

If the optimal cc stays the same across a wide range of plausible input values, your solution is robust. If it changes with small input shifts, you should be cautious about your recommendation.

Beyond the math, practical factors also influence the final decision:

  • Space constraints: You may not have room for more service stations
  • Labor regulations: Shift scheduling, overtime rules, and minimum staffing requirements
  • Service level targets: Some organizations commit to standards like "90% of customers served within 5 minutes," which may override the pure cost optimum
  • Demand variability: Arrival rates often change by time of day, so the optimal cc might differ for peak vs. off-peak hours

The quantitative model gives you a starting point, but the final staffing decision should account for these real-world constraints.