upgrade
upgrade

💻Parallel and Distributed Computing

Load Balancing Algorithms

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Load balancing sits at the heart of parallel and distributed computing—it's the mechanism that transforms a collection of individual servers into a unified, high-performance system. When you're tested on this topic, you're being asked to demonstrate understanding of resource allocation, fault tolerance, scalability, and performance optimization. The algorithms themselves are just implementations of deeper principles about how distributed systems manage competing demands for finite resources.

Don't just memorize which algorithm does what. Instead, focus on why each approach exists: What problem does it solve? What assumptions does it make about the environment? When would it fail? Understanding the trade-offs between simplicity and adaptability, static and dynamic allocation, and stateless and stateful routing will serve you far better on exams than rote recall. Each algorithm represents a different answer to the fundamental question: How do we distribute work fairly and efficiently?


Static Distribution Algorithms

These algorithms make routing decisions without considering current server state. They're predictable, low-overhead, and work well when server capabilities and request costs are uniform. The key trade-off: simplicity comes at the cost of responsiveness to changing conditions.

Round Robin

  • Sequential distribution—requests cycle through servers in fixed order (Server 1 → Server 2 → Server 3 → repeat)
  • Zero state tracking required, making implementation trivial and overhead minimal
  • Assumes homogeneous servers with equal processing power; breaks down when servers differ in capability

Weighted Round Robin

  • Capacity-aware distribution—assigns weights (e.g., Server A: weight 3, Server B: weight 1) so stronger servers receive proportionally more requests
  • Static weights must be configured manually based on known server specifications
  • Handles heterogeneous environments where servers have different CPU, memory, or network capabilities

Random

  • Probabilistic distribution—each request routed to a randomly selected server with no memory of previous decisions
  • Stateless simplicity means no coordination overhead, useful in highly distributed environments
  • Statistical fairness only—individual servers may experience load spikes; law of large numbers provides balance over time

Compare: Round Robin vs. Random—both are stateless and simple, but Round Robin guarantees even distribution while Random only achieves it probabilistically. For exam questions about deterministic vs. probabilistic load balancing, this is your key distinction.


Connection-Aware Algorithms

These algorithms track active connections to make smarter routing decisions. They respond to actual load rather than assumed load, making them more adaptive but requiring state maintenance.

Least Connection

  • Routes to the server with fewest active connections—directly measures current load rather than assuming uniform request distribution
  • Requires connection tracking, adding overhead but enabling responsiveness to variable request durations
  • Ideal for long-lived connections (WebSockets, database queries) where request completion times vary significantly

Weighted Least Connection

  • Combines connection count with server capacity—calculates active connectionsserver weight\frac{\text{active connections}}{\text{server weight}} to find the relatively least-loaded server
  • Handles heterogeneous clusters where a powerful server with 100 connections may be less loaded than a weak server with 50
  • Best of both worlds for production environments with mixed hardware generations

Compare: Least Connection vs. Weighted Least Connection—both track active connections, but the weighted version accounts for server heterogeneity. If an FRQ asks about load balancing in a mixed-capacity cluster, Weighted Least Connection is your answer.


Performance-Based Algorithms

These algorithms measure actual server performance metrics to make routing decisions. They're the most responsive but require continuous monitoring infrastructure. The principle: let observed behavior, not assumptions, drive decisions.

Least Response Time

  • Routes to the fastest-responding server—measures actual latency, not just connection count
  • Requires real-time monitoring of response times, adding infrastructure complexity
  • Directly optimizes user experience by minimizing perceived latency; accounts for factors connection count misses (CPU load, disk I/O)

Least Bandwidth

  • Routes to the server consuming least network bandwidth—critical when network capacity is the bottleneck
  • Targets network resource optimization rather than CPU or connection metrics
  • Essential for bandwidth-intensive applications like video streaming or large file transfers where network saturation is the primary concern

Compare: Least Response Time vs. Least Bandwidth—both measure real-time metrics, but they optimize for different bottlenecks. Response time targets latency-sensitive applications; bandwidth targets throughput-sensitive applications. Know which metric matters for which workload type.


Session-Persistent Algorithms

Some applications require that a client consistently reaches the same server—for session data, caching efficiency, or stateful protocols. These algorithms sacrifice load optimization for affinity.

IP Hash

  • Deterministic routing via hash function—applies hash to client IP (e.g., server=hash(IP)modn\text{server} = \text{hash(IP)} \mod n) for consistent server assignment
  • Enables session persistence without storing session state in a shared database or passing session tokens
  • Breaks on IP changes (mobile users, NAT) and can cause imbalance if client IPs aren't uniformly distributed

Compare: IP Hash vs. Least Connection—IP Hash prioritizes consistency (same client always hits same server), while Least Connection prioritizes balance (requests go where load is lowest). This trade-off between affinity and fairness is a classic distributed systems question.


Adaptive Algorithms

These algorithms adjust their behavior based on observed conditions, representing the most sophisticated approach to load balancing. They treat load balancing as a continuous optimization problem rather than a fixed policy.

Dynamic Load Balancing

  • Real-time adjustment of routing decisions based on current server loads, connection counts, and response times
  • Responds to changing conditions—handles traffic spikes, server failures, and variable request costs automatically
  • Higher overhead from continuous monitoring and decision recalculation; trade-off between responsiveness and computational cost

Adaptive Load Balancing

  • Learning-based optimization—uses historical performance data to predict optimal routing, not just react to current state
  • Machine learning integration enables pattern recognition (e.g., anticipating lunch-hour traffic spikes)
  • Continuous improvement as the system accumulates data; represents the cutting edge of production load balancing

Compare: Dynamic vs. Adaptive Load Balancing—Dynamic reacts to current conditions; Adaptive predicts future conditions based on learned patterns. Both are real-time, but Adaptive adds a predictive layer. For questions about proactive vs. reactive resource management, this distinction matters.


Quick Reference Table

ConceptBest Examples
Stateless/SimpleRound Robin, Random
Capacity-AwareWeighted Round Robin, Weighted Least Connection
Connection-TrackingLeast Connection, Weighted Least Connection
Performance-Metric BasedLeast Response Time, Least Bandwidth
Session PersistenceIP Hash
Real-Time AdaptationDynamic Load Balancing, Adaptive Load Balancing
Homogeneous ServersRound Robin, Least Connection, Random
Heterogeneous ServersWeighted Round Robin, Weighted Least Connection

Self-Check Questions

  1. Which two algorithms both use server weights, and what additional factor does one consider that the other ignores?

  2. You're designing a load balancer for a video streaming service where network bandwidth is limited but servers have similar specs. Which algorithm would you choose and why?

  3. Compare and contrast IP Hash and Least Connection in terms of their trade-offs between session persistence and load optimization. When would you sacrifice one for the other?

  4. A system administrator notices that Round Robin is causing some servers to become overloaded while others sit idle. What assumptions has Round Robin made that aren't holding true, and which algorithm would you recommend instead?

  5. If an FRQ asks you to design a load balancing strategy for a cloud application with unpredictable traffic patterns and servers of varying capabilities, which algorithm category should you focus on and what monitoring infrastructure would it require?