🏭Intro to Industrial Engineering Unit 3 Review

Queuing theory gives you the math to predict how lines form and how long people wait. The two foundational models are M/M/1 (single-server) and M/M/c (multi-server), and understanding both is essential for designing efficient service systems.

In the notation M/M/1 or M/M/c, each letter means something specific. The first M stands for Markovian (exponential) interarrival times. The second M means Markovian (exponential) service times. The number at the end tells you how many servers are working in parallel: 1 for a single server, c for multiple servers.

Single-server models apply to simple setups like a single checkout counter or a one-window permit office. Multi-server models represent systems where multiple workers handle the same queue in parallel, like a call center with several agents or a bank with multiple tellers.

The utilization factor is calculated differently for each:

Single-server: $\rho = \lambda / \mu$
Multi-server: $\rho = \lambda / (c\mu)$

where $\lambda$ is the arrival rate, $\mu$ is the service rate per server, and $c$ is the number of servers. In both cases, $\rho$ represents the fraction of available capacity being used.

Performance Measure Comparisons

The formulas for key performance measures are simpler in M/M/1 and more involved in M/M/c:

Average number in the system ( $L$ ):
- M/M/1: $L = \lambda / (\mu - \lambda)$
- M/M/c: Requires the Erlang C function (covered below)
Average waiting time in queue ( $W_q$ ):
- M/M/1: $W_q = \rho / (\mu - \lambda)$
- M/M/c: Also depends on the Erlang C function
Probability of zero customers ( $P_0$ ):
- M/M/1: $P_0 = 1 - \rho$
- M/M/c: A more complex expression involving summation and factorial terms

Stability conditions also differ. An M/M/1 system is stable when $\rho < 1$ , meaning arrivals must be slower than service. An M/M/c system is stable when $\lambda < c\mu$ , which is equivalent to requiring $\rho < 1$ using the multi-server definition. If these conditions aren't met, the queue grows without bound.

Applying M/M/1 and M/M/c Models

M/M/1 Model Application

To use the M/M/1 model, you only need two inputs: the arrival rate $\lambda$ and the service rate $\mu$ . From those, you can derive every performance measure.

Core M/M/1 formulas:

Utilization factor: $\rho = \lambda / \mu$
Average number in the system: $L = \lambda / (\mu - \lambda)$
Average number in the queue: $L_q = \rho^2 / (1 - \rho)$
Average time in the system: $W = 1 / (\mu - \lambda)$
Average waiting time in the queue: $W_q = \rho / (\mu - \lambda)$
Probability of $n$ customers in the system: $P_n = (1 - \rho)\rho^n$
Probability of waiting (system is busy): $P_w = \rho$

Notice that $L$ , $W$ , $L_q$ , and $W_q$ are all connected through Little's Law: $L = \lambda W$ and $L_q = \lambda W_q$ . If you know any one of these measures plus $\lambda$ , you can find the others.

Worked example: A single-server coffee shop has customers arriving every 5 minutes ( $\lambda = 0.2$ per min) and an average service time of 4 minutes ( $\mu = 0.25$ per min).

Calculate utilization: $\rho = 0.2 / 0.25 = 0.8$
Average number in system: $L = 0.2 / (0.25 - 0.2) = 4$ customers
Average time in system: $W = 1 / (0.25 - 0.2) = 20$ minutes
Average wait in queue: $W_q = 0.8 / (0.25 - 0.2) = 16$ minutes
Average number in queue: $L_q = 0.8^2 / (1 - 0.8) = 0.64 / 0.2 = 3.2$ customers

That 80% utilization rate means the server is busy most of the time, and customers wait an average of 16 minutes just in line before being served. The remaining 4 minutes ( $W - W_q = 20 - 16$ ) is the actual service time.

M/M/c Model Application

The M/M/c model adds a third input: the number of servers $c$ . The formulas are more complex because they rely on the Erlang C formula, which gives the probability that an arriving customer has to wait (i.e., all servers are busy).

Erlang C formula:

$C(c, \rho) = \frac{(c\rho)^c}{c!(1-\rho)} \bigg/ \left[\sum_{n=0}^{c-1} \frac{(c\rho)^n}{n!} + \frac{(c\rho)^c}{c!(1-\rho)}\right]$

Here, $\rho = \lambda / (c\mu)$ is the per-server utilization. The numerator captures the probability of all $c$ servers being busy, and the denominator normalizes across all possible system states.

Key M/M/c formulas:

Probability of waiting: $P_w = C(c, \rho)$
Average number in queue: $L_q = \frac{C(c, \rho) \cdot \rho}{1 - \rho}$
Average waiting time in queue: $W_q = \frac{C(c, \rho)}{c\mu(1 - \rho)}$
Average time in system: $W = W_q + 1/\mu$
Average number in system: $L = L_q + \lambda / \mu$

Little's Law still applies here: $L = \lambda W$ and $L_q = \lambda W_q$ .

Worked example: A call center has 5 agents ( $c = 5$ ), calls arriving every 2 minutes ( $\lambda = 0.5$ per min), and an average call duration of 8 minutes ( $\mu = 0.125$ per min).

Calculate utilization: $\rho = 0.5 / (5 \times 0.125) = 0.8$
Compute $C(c, \rho)$ using the Erlang C formula (this is typically done with software or tables)
Use $C(c, \rho)$ to find $W_q$ , $L_q$ , and other measures

The utilization is 0.8 in both this and the coffee shop example, but the multi-server system distributes that load across 5 agents. The Erlang C calculation is tedious by hand, so in practice you'll use a calculator, spreadsheet, or lookup table.

Model Assumptions and Limitations

Both M/M/1 and M/M/c share the same core assumptions:

Poisson arrivals: Customers arrive randomly at a constant average rate
Exponential service times: Service durations follow an exponential distribution (the memoryless property means the probability of finishing service doesn't depend on how long service has already taken)
FCFS discipline: First-come, first-served
Unlimited queue capacity: The line can grow infinitely long
No balking or reneging: Customers never refuse to join the line (balking) or leave after joining (reneging)
Infinite calling population: The arrival rate doesn't change based on how many customers are already in the system

These assumptions rarely hold perfectly in practice. Real customers do leave long lines, queue capacity is often physically limited, and arrival rates can fluctuate throughout the day. When these assumptions are significantly violated, you'll need more advanced models (like M/M/c/K for finite queue capacity or M/G/1 for non-exponential service times). Sensitivity analysis helps you gauge how much your results might shift when real conditions deviate from these assumptions.

System Parameters Impact on Queuing Performance

Fundamental Differences and Applications, International Journal of Physical Sciences - performance evaluation of law enforcement agency on ...

Utilization Factor and System Stability

The utilization factor $\rho$ is the single most important parameter in queuing. As $\rho$ approaches 1, performance degrades rapidly and non-linearly.

Here's a concrete illustration for an M/M/1 system (using $L_q = \rho^2 / (1 - \rho)$ ):

$\rho$	$L_q$ (avg customers in queue)
0.5	0.5
0.7	1.63
0.9	8.1
0.95	18.05

The jump from $\rho = 0.5$ to $\rho = 0.9$ increases the average queue length by a factor of 16. This non-linear relationship is one of the most important takeaways from queuing theory: a system that's "only" 90% utilized can have extremely long waits.

As $\rho$ nears 1, the probability of waiting approaches 1 and average waiting time grows toward infinity. This is why real systems should never be designed to operate near 100% utilization. You need slack capacity to absorb the natural randomness in arrivals and service times.

Arrival and Service Rates

Arrival rate $\lambda$ and service rate $\mu$ have opposite effects on performance:

Increasing $\lambda$ (more customers per hour) raises $\rho$ , leading to longer queues, longer waits, and higher utilization
Increasing $\mu$ (faster service) lowers $\rho$ , leading to shorter queues, shorter waits, and lower utilization

There's an important trade-off here: pushing service speed higher may reduce service quality. A barista who rushes every order will have shorter queues but possibly unhappy customers.

Example: In an M/M/c system with 3 servers and $\mu = 8$ customers/hour/server, doubling $\lambda$ from 10 to 20 customers/hour increases $W_q$ from about 0.05 hours (3 minutes) to 0.33 hours (20 minutes). That's roughly a 6x increase in wait time from a 2x increase in arrivals. The non-linearity strikes again.

Number of Servers and System Variability

Adding servers in an M/M/c model improves performance, but with diminishing returns. Each additional server helps less than the one before it.

Example: With $\lambda = 20$ customers/hour and $\mu = 8$ customers/hour/server:

Servers ( $c$ )	$W_q$ (hours)
3	0.33
4	0.08
5	0.02

Going from 3 to 4 servers cuts wait time by about 75%. Going from 4 to 5 cuts it by another 75%, but the absolute improvement is much smaller (0.25 hours vs. 0.06 hours). At some point, adding more servers costs more than the wait time it saves.

Higher variability in arrival or service times also degrades performance beyond what M/M/1 and M/M/c predict. The exponential distribution already has a relatively high coefficient of variation (equal to 1). If your real system has even more variable service times (some customers take 2 minutes, others take 30), the standard models will underestimate actual wait times.

Optimal Server Number in Multi-Server Systems

Economic Analysis and Cost Functions

The goal is to find the number of servers that minimizes total cost, which combines two competing expenses:

Server costs (wages, equipment): increase as you add servers
Waiting costs (customer dissatisfaction, lost sales): decrease as you add servers

The total cost function is:

$TC(c) = c \cdot C_s + \lambda \cdot W_q(c) \cdot C_w$

where:

$c$ = number of servers
$C_s$ = cost per server per unit time
$C_w$ = waiting cost per customer per unit time
$\lambda$ = arrival rate
$W_q(c)$ = average waiting time in queue (which depends on $c$ )

The first term ( $c \cdot C_s$ ) is the total server cost. The second term ( $\lambda \cdot W_q(c) \cdot C_w$ ) is the total waiting cost: $\lambda \cdot W_q(c)$ gives you $L_q$ (by Little's Law), and multiplying by $C_w$ converts that into a dollar cost. The optimal $c$ sits at the minimum of this total cost curve.

Optimization Techniques

Since $c$ must be a positive integer, you can find the optimum through marginal analysis:

Start with the minimum feasible number of servers (the smallest $c$ where $\lambda < c\mu$ , i.e., $\rho < 1$ )
Calculate $TC(c)$
Calculate $TC(c+1)$
If $TC(c+1) < TC(c)$ , increase $c$ by 1 and repeat from step 2
Stop when $TC(c+1) \geq TC(c)$ . The current $c$ is your optimum

The total cost curve is typically convex (U-shaped), so once the cost starts increasing, you've passed the minimum.

You can also use a graphical method: plot $TC$ against $c$ and visually identify the lowest point. For more complex scenarios with additional constraints (budget limits, space restrictions), integer programming techniques may be needed.

Example: A retail store has $C_s = \$20$ /hour, $C_w = \$15$ /hour, $\lambda = 30$ customers/hour, and $\mu = 10$ customers/hour/server. Since $\lambda / \mu = 3$ , you need at least 4 servers for stability. You'd calculate $TC$ for $c = 4, 5, 6, ...$ and pick the $c$ with the lowest total cost.

Sensitivity Analysis and Practical Considerations

Your optimal solution depends on cost estimates that may be uncertain. Waiting cost $C_w$ in particular is hard to pin down precisely. Sensitivity analysis tests how robust your answer is by varying the inputs:

One-way analysis: Change one parameter (say $C_w$ ) while holding everything else constant. For instance, if $C_w$ increases from $15/hour to $25/hour, the optimal number of servers may jump from 4 to 5 because customer waiting becomes more expensive.
Two-way analysis: Vary two parameters simultaneously to see how they interact (e.g., what happens if both $\lambda$ and $C_w$ increase?).

If the optimal $c$ stays the same across a wide range of plausible input values, your solution is robust. If it changes with small input shifts, you should be cautious about your recommendation.

Beyond the math, practical factors also influence the final decision:

Space constraints: You may not have room for more service stations
Labor regulations: Shift scheduling, overtime rules, and minimum staffing requirements
Service level targets: Some organizations commit to standards like "90% of customers served within 5 minutes," which may override the pure cost optimum
Demand variability: Arrival rates often change by time of day, so the optimal $c$ might differ for peak vs. off-peak hours

The quantitative model gives you a starting point, but the final staffing decision should account for these real-world constraints.

🏭Intro to Industrial Engineering Unit 3 Review