An M/M/1 queue models a system with a single server where customers arrive randomly and service takes a random amount of time. The name itself encodes the assumptions: the first "M" means Markovian (memoryless) arrivals, the second "M" means Markovian service times, and the "1" means one server.

Arrival process

Customers arrive according to a Poisson process with rate $\lambda$ (customers per unit time). This means:

Inter-arrival times are independent and exponentially distributed with mean $1/\lambda$ .
Arrivals are independent of each other and of the current state of the system.
In any small interval $dt$ , the probability of exactly one arrival is approximately $\lambda \, dt$ .

Service time distribution

Each customer's service time is exponentially distributed with rate $\mu$ , so the mean service time is $1/\mu$ . The memoryless property of the exponential distribution is what makes the analysis tractable: no matter how long a service has been in progress, the remaining time has the same distribution. This is precisely why the system's future behavior depends only on the current number of customers, making it a continuous-time Markov chain.

Traffic intensity

Traffic intensity is defined as:

$\rho = \frac{\lambda}{\mu}$

This ratio captures how fast work arrives relative to how fast the server can handle it. For the queue to reach a steady state, you need $\rho < 1$ . If $\rho \geq 1$ , the server can't keep up with arrivals and the queue grows without bound.

Think of $\rho$ as the long-run fraction of time the server is busy. At $\rho = 0.9$ , the server is idle only 10% of the time, and congestion becomes severe.

Steady-state probabilities

Once the system reaches steady state, the probability of finding exactly $n$ customers in the system is:

$P_n = (1 - \rho)\,\rho^n, \quad n = 0, 1, 2, \ldots$

This is a geometric distribution. A few things to note:

$P_0 = 1 - \rho$ is the probability the system is empty (and the server is idle).
The probabilities decay exponentially in $n$ , so very long queues are possible but increasingly unlikely when $\rho$ is well below 1.
All performance measures can be derived from this distribution.

Performance measures of M/M/1 queues

These formulas let you quantify how congested a system is and how long customers wait. They all follow from the steady-state distribution above.

Server utilization

$U = \rho = \frac{\lambda}{\mu}$

This is the fraction of time the server is busy. For example, if $\lambda = 8$ customers/hour and $\mu = 10$ customers/hour, then $\rho = 0.8$ , meaning the server is busy 80% of the time.

Expected number in system

$L = \frac{\rho}{1 - \rho} = \frac{\lambda}{\mu - \lambda}$

This counts everyone present: the customer in service plus those waiting. With $\rho = 0.8$ , you get $L = 0.8/0.2 = 4$ customers on average. Notice how $L$ grows rapidly as $\rho \to 1$ .

Expected number in queue

$L_q = \frac{\rho^2}{1 - \rho} = \frac{\lambda^2}{\mu(\mu - \lambda)}$

This counts only the customers waiting (not the one being served). The relationship $L_q = L - \rho$ is worth remembering.

Expected waiting time in system

$W = \frac{1}{\mu - \lambda}$

This is the total time from arrival to departure (waiting plus service). With $\lambda = 8$ and $\mu = 10$ , $W = 1/2$ hour = 30 minutes.

Expected waiting time in queue

$W_q = \frac{\lambda}{\mu(\mu - \lambda)}$

This is the time spent waiting before service begins. Note that $W = W_q + 1/\mu$ , which makes sense: total time in system equals time waiting plus time being served.

Little's Law connects these measures: $L = \lambda W$ and $L_q = \lambda W_q$ . These relationships hold for any stable queue, not just M/M/1. They're extremely useful for converting between "number" measures and "time" measures.

Variations of M/M/1 queues

The basic M/M/1 model assumes infinite capacity and an infinite customer population. Several variations relax these assumptions.

Arrival process, Poisson distribution - Wikipedia

M/M/1/K queues with limited capacity

The system can hold at most $K$ customers (including the one in service). When the system is full, new arrivals are turned away and lost. Because arrivals are blocked at capacity, this system can be stable even when $\rho \geq 1$ . The steady-state probabilities become:

$P_n = \frac{(1-\rho)\,\rho^n}{1 - \rho^{K+1}}, \quad n = 0, 1, \ldots, K$

(When $\rho = 1$ , each state is equally likely: $P_n = 1/(K+1)$ .)

M/M/1/K/K queues with limited population

Here the total customer population is finite ( $K$ customers). A customer who is already in the system can't generate another arrival. This is useful for modeling machine repair problems: you have $K$ machines, and only machines that are running can break down. The effective arrival rate depends on how many customers are already in the system, which changes the balance equations.

M/M/1 queues with balking

Customers who arrive and see a long queue may decide not to join at all. Typically, the probability of joining decreases as the number of customers already present increases. This reduces the effective arrival rate and lowers congestion compared to the standard model.

M/M/1 queues with reneging

Customers join the queue but leave if they wait too long. Each customer's patience is usually modeled as an exponential random variable. Reneging also reduces the effective load on the system but requires modified balance equations to compute steady-state probabilities.

Basics of M/M/c queues

An M/M/c queue extends the M/M/1 model to $c$ identical servers working in parallel, all fed by a single queue. Customers are served first-come, first-served by whichever server becomes free next.

Arrival and service time distributions

The assumptions are the same as M/M/1:

Arrivals follow a Poisson process with rate $\lambda$ .
Each server has independent, exponentially distributed service times with rate $\mu$ .

The combined maximum service rate of the system is $c\mu$ .

Traffic intensity for M/M/c

$\rho = \frac{\lambda}{c\mu}$

This is the utilization per server. Stability again requires $\rho < 1$ , meaning $\lambda < c\mu$ . It's common to also define $a = \lambda/\mu$ (the offered load in Erlangs), so $\rho = a/c$ .

Steady-state probabilities for M/M/c

The probability of an empty system is:

$P_0 = \left[\sum_{n=0}^{c-1}\frac{a^n}{n!} + \frac{a^c}{c!(1-\rho)}\right]^{-1}$

where $a = \lambda/\mu = c\rho$ . The state probabilities are then:

For $0 \leq n < c$ : $\quad P_n = \frac{a^n}{n!}\,P_0$
For $n \geq c$ : $\quad P_n = \frac{a^n}{c!\,c^{n-c}}\,P_0 = \frac{a^c}{c!}\,\rho^{n-c}\,P_0$

The key difference from M/M/1: when $n < c$ , some servers are idle and the effective service rate is $n\mu$ (not $\mu$ ). When $n \geq c$ , all servers are busy and the effective service rate is $c\mu$ .

Performance measures of M/M/c queues

Probability of waiting (Erlang C formula)

The probability that an arriving customer finds all servers busy and must wait is:

$P_w = C(c, a) = \frac{a^c}{c!(1-\rho)} \cdot P_0$

This is the Erlang C formula, one of the most widely used results in queueing theory. Call centers, for instance, use it daily to staff agents.

Expected number in queue

$L_q = \frac{\rho\,P_w}{1 - \rho}$

Arrival process, The Exponential Distribution · Statistics

Expected number in system

$L = L_q + a = L_q + \frac{\lambda}{\mu}$

The $a = \lambda/\mu$ term accounts for the average number of customers currently in service.

Expected waiting time in queue

By Little's Law:

$W_q = \frac{L_q}{\lambda}$

Expected waiting time in system

$W = W_q + \frac{1}{\mu}$

Numerical example: Suppose $\lambda = 20$ customers/hour, $\mu = 8$ customers/hour, and $c = 3$ servers. Then $a = 20/8 = 2.5$ and $\rho = 2.5/3 \approx 0.833$ . You'd compute $P_0$ , then $P_w$ , and from there all the performance measures follow. Compare this to an M/M/1 queue with $\mu = 24$ (same total service capacity): the M/M/c system will generally have shorter queues because pooling servers is more efficient than having one fast server.

Designing M/M/c queueing systems

Determining the number of servers needed

The typical approach:

Estimate $\lambda$ and $\mu$ from data.
Pick a performance target (e.g., "average wait in queue under 2 minutes" or "at most 20% of customers wait").
Compute $L_q$ , $W_q$ , or $P_w$ for $c = 1, 2, 3, \ldots$ until the target is met.
The smallest $c$ satisfying the target is your answer.

Since the formulas involve factorials and sums, this is typically done computationally or with Erlang C tables.

Cost analysis and optimization

A common formulation balances two costs:

Server cost: $c \cdot C_s$ per unit time (wages, equipment).
Waiting cost: $L \cdot C_w$ per unit time (customer dissatisfaction, lost productivity).

The total cost $TC = c \cdot C_s + L \cdot C_w$ is evaluated for different values of $c$ , and you pick the $c$ that minimizes it. Because $L$ decreases with $c$ but server cost increases, there's usually a clear optimum.

Quality of service requirements

In practice, systems are designed around service level agreements (SLAs). Common targets include:

A specified percentage of customers served within a target time (e.g., 80% of calls answered within 20 seconds).
Maximum acceptable probability of waiting.
Maximum acceptable average wait.

The M/M/c model provides the formulas to check whether a given configuration meets these targets.

Capacity planning considerations

Arrival rates fluctuate over time (think of a call center with morning peaks). Capacity planning involves:

Forecasting demand at different times of day or seasons.
Sizing the system for peak loads or using time-varying staffing.
Building in headroom so that small demand increases don't push the system past $\rho < 1$ .

M/M/1 vs M/M/c queues

Differences in system behavior

The fundamental difference is the number of servers. In M/M/1, a single server handles everything. In M/M/c, $c$ servers share the load from one queue. This pooling effect is powerful: combining resources into a shared pool is more efficient than running separate single-server queues.

For example, two separate M/M/1 queues each with $\lambda = 4$ and $\mu = 5$ ( $\rho = 0.8$ each) will have $L = 4$ per queue, so $L_{total} = 8$ . A single M/M/2 queue with $\lambda = 8$ and $\mu = 5$ has $\rho = 0.8$ but $L$ will be significantly less than 8 due to the pooling effect.

Performance comparison

For the same total service capacity ( $c\mu$ equal in both cases):

M/M/c queues have shorter expected queue lengths ( $L_q$ ).
M/M/c queues have shorter expected waiting times ( $W_q$ ).
The probability of waiting ( $P_w$ ) is lower in M/M/c.

The advantage of pooling grows as utilization increases. At low $\rho$ , the difference is small. At high $\rho$ , pooling makes a dramatic difference.

Suitability for different applications

M/M/1 fits systems with a single resource: a single checkout lane, a single-processor system, a single communication link.
M/M/c fits systems with parallel resources: a bank with multiple tellers, a hospital emergency department, a multi-core server.

Transitioning from M/M/1 to M/M/c

When demand grows beyond what one server can handle ( $\rho$ approaching 1), adding servers is the natural response. The decision should be based on:

Current performance metrics and how close $\rho$ is to 1.
Projected demand growth.
The cost of additional servers versus the cost of degraded service.

Adding even one server (going from M/M/1 to M/M/2) can produce a large improvement in waiting times when the system is heavily loaded.