Single vs Multi-Server Queuing Models
Fundamental Differences and Applications
Queuing theory gives you the math to predict how lines form and how long people wait. The two foundational models are M/M/1 (single-server) and M/M/c (multi-server), and understanding both is essential for designing efficient service systems.
In the notation M/M/1 or M/M/c, each letter means something specific. The first M stands for Markovian (exponential) interarrival times. The second M means Markovian (exponential) service times. The number at the end tells you how many servers are working in parallel: 1 for a single server, c for multiple servers.
Single-server models apply to simple setups like a single checkout counter or a one-window permit office. Multi-server models represent systems where multiple workers handle the same queue in parallel, like a call center with several agents or a bank with multiple tellers.
The utilization factor is calculated differently for each:
- Single-server:
- Multi-server:
where is the arrival rate, is the service rate per server, and is the number of servers. In both cases, represents the fraction of available capacity being used.
Performance Measure Comparisons
The formulas for key performance measures are simpler in M/M/1 and more involved in M/M/c:
- Average number in the system ():
- M/M/1:
- M/M/c: Requires the Erlang C function (covered below)
- Average waiting time in queue ():
- M/M/1:
- M/M/c: Also depends on the Erlang C function
- Probability of zero customers ():
- M/M/1:
- M/M/c: A more complex expression involving summation and factorial terms
Stability conditions also differ. An M/M/1 system is stable when , meaning arrivals must be slower than service. An M/M/c system is stable when , which is equivalent to requiring using the multi-server definition. If these conditions aren't met, the queue grows without bound.
Applying M/M/1 and M/M/c Models
M/M/1 Model Application
To use the M/M/1 model, you only need two inputs: the arrival rate and the service rate . From those, you can derive every performance measure.
Core M/M/1 formulas:
- Utilization factor:
- Average number in the system:
- Average number in the queue:
- Average time in the system:
- Average waiting time in the queue:
- Probability of customers in the system:
- Probability of waiting (system is busy):
Notice that , , , and are all connected through Little's Law: and . If you know any one of these measures plus , you can find the others.
Worked example: A single-server coffee shop has customers arriving every 5 minutes ( per min) and an average service time of 4 minutes ( per min).
-
Calculate utilization:
-
Average number in system: customers
-
Average time in system: minutes
-
Average wait in queue: minutes
-
Average number in queue: customers
That 80% utilization rate means the server is busy most of the time, and customers wait an average of 16 minutes just in line before being served. The remaining 4 minutes () is the actual service time.
M/M/c Model Application
The M/M/c model adds a third input: the number of servers . The formulas are more complex because they rely on the Erlang C formula, which gives the probability that an arriving customer has to wait (i.e., all servers are busy).
Erlang C formula:
Here, is the per-server utilization. The numerator captures the probability of all servers being busy, and the denominator normalizes across all possible system states.
Key M/M/c formulas:
- Probability of waiting:
- Average number in queue:
- Average waiting time in queue:
- Average time in system:
- Average number in system:
Little's Law still applies here: and .
Worked example: A call center has 5 agents (), calls arriving every 2 minutes ( per min), and an average call duration of 8 minutes ( per min).
- Calculate utilization:
- Compute using the Erlang C formula (this is typically done with software or tables)
- Use to find , , and other measures
The utilization is 0.8 in both this and the coffee shop example, but the multi-server system distributes that load across 5 agents. The Erlang C calculation is tedious by hand, so in practice you'll use a calculator, spreadsheet, or lookup table.
Model Assumptions and Limitations
Both M/M/1 and M/M/c share the same core assumptions:
- Poisson arrivals: Customers arrive randomly at a constant average rate
- Exponential service times: Service durations follow an exponential distribution (the memoryless property means the probability of finishing service doesn't depend on how long service has already taken)
- FCFS discipline: First-come, first-served
- Unlimited queue capacity: The line can grow infinitely long
- No balking or reneging: Customers never refuse to join the line (balking) or leave after joining (reneging)
- Infinite calling population: The arrival rate doesn't change based on how many customers are already in the system
These assumptions rarely hold perfectly in practice. Real customers do leave long lines, queue capacity is often physically limited, and arrival rates can fluctuate throughout the day. When these assumptions are significantly violated, you'll need more advanced models (like M/M/c/K for finite queue capacity or M/G/1 for non-exponential service times). Sensitivity analysis helps you gauge how much your results might shift when real conditions deviate from these assumptions.
System Parameters Impact on Queuing Performance

Utilization Factor and System Stability
The utilization factor is the single most important parameter in queuing. As approaches 1, performance degrades rapidly and non-linearly.
Here's a concrete illustration for an M/M/1 system (using ):
| (avg customers in queue) | |
|---|---|
| 0.5 | 0.5 |
| 0.7 | 1.63 |
| 0.9 | 8.1 |
| 0.95 | 18.05 |
The jump from to increases the average queue length by a factor of 16. This non-linear relationship is one of the most important takeaways from queuing theory: a system that's "only" 90% utilized can have extremely long waits.
As nears 1, the probability of waiting approaches 1 and average waiting time grows toward infinity. This is why real systems should never be designed to operate near 100% utilization. You need slack capacity to absorb the natural randomness in arrivals and service times.
Arrival and Service Rates
Arrival rate and service rate have opposite effects on performance:
- Increasing (more customers per hour) raises , leading to longer queues, longer waits, and higher utilization
- Increasing (faster service) lowers , leading to shorter queues, shorter waits, and lower utilization
There's an important trade-off here: pushing service speed higher may reduce service quality. A barista who rushes every order will have shorter queues but possibly unhappy customers.
Example: In an M/M/c system with 3 servers and customers/hour/server, doubling from 10 to 20 customers/hour increases from about 0.05 hours (3 minutes) to 0.33 hours (20 minutes). That's roughly a 6x increase in wait time from a 2x increase in arrivals. The non-linearity strikes again.
Number of Servers and System Variability
Adding servers in an M/M/c model improves performance, but with diminishing returns. Each additional server helps less than the one before it.
Example: With customers/hour and customers/hour/server:
| Servers () | (hours) |
|---|---|
| 3 | 0.33 |
| 4 | 0.08 |
| 5 | 0.02 |
Going from 3 to 4 servers cuts wait time by about 75%. Going from 4 to 5 cuts it by another 75%, but the absolute improvement is much smaller (0.25 hours vs. 0.06 hours). At some point, adding more servers costs more than the wait time it saves.
Higher variability in arrival or service times also degrades performance beyond what M/M/1 and M/M/c predict. The exponential distribution already has a relatively high coefficient of variation (equal to 1). If your real system has even more variable service times (some customers take 2 minutes, others take 30), the standard models will underestimate actual wait times.
Optimal Server Number in Multi-Server Systems
Economic Analysis and Cost Functions
The goal is to find the number of servers that minimizes total cost, which combines two competing expenses:
- Server costs (wages, equipment): increase as you add servers
- Waiting costs (customer dissatisfaction, lost sales): decrease as you add servers
The total cost function is:
where:
- = number of servers
- = cost per server per unit time
- = waiting cost per customer per unit time
- = arrival rate
- = average waiting time in queue (which depends on )
The first term () is the total server cost. The second term () is the total waiting cost: gives you (by Little's Law), and multiplying by converts that into a dollar cost. The optimal sits at the minimum of this total cost curve.
Optimization Techniques
Since must be a positive integer, you can find the optimum through marginal analysis:
- Start with the minimum feasible number of servers (the smallest where , i.e., )
- Calculate
- Calculate
- If , increase by 1 and repeat from step 2
- Stop when . The current is your optimum
The total cost curve is typically convex (U-shaped), so once the cost starts increasing, you've passed the minimum.
You can also use a graphical method: plot against and visually identify the lowest point. For more complex scenarios with additional constraints (budget limits, space restrictions), integer programming techniques may be needed.
Example: A retail store has /hour, /hour, customers/hour, and customers/hour/server. Since , you need at least 4 servers for stability. You'd calculate for and pick the with the lowest total cost.
Sensitivity Analysis and Practical Considerations
Your optimal solution depends on cost estimates that may be uncertain. Waiting cost in particular is hard to pin down precisely. Sensitivity analysis tests how robust your answer is by varying the inputs:
- One-way analysis: Change one parameter (say ) while holding everything else constant. For instance, if increases from $15/hour to $25/hour, the optimal number of servers may jump from 4 to 5 because customer waiting becomes more expensive.
- Two-way analysis: Vary two parameters simultaneously to see how they interact (e.g., what happens if both and increase?).
If the optimal stays the same across a wide range of plausible input values, your solution is robust. If it changes with small input shifts, you should be cautious about your recommendation.
Beyond the math, practical factors also influence the final decision:
- Space constraints: You may not have room for more service stations
- Labor regulations: Shift scheduling, overtime rules, and minimum staffing requirements
- Service level targets: Some organizations commit to standards like "90% of customers served within 5 minutes," which may override the pure cost optimum
- Demand variability: Arrival rates often change by time of day, so the optimal might differ for peak vs. off-peak hours
The quantitative model gives you a starting point, but the final staffing decision should account for these real-world constraints.