💻Parallel and Distributed Computing

Load Balancing Algorithms

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Load balancing sits at the heart of parallel and distributed computing—it's the mechanism that transforms a collection of individual servers into a unified, high-performance system. When you're tested on this topic, you're being asked to demonstrate understanding of resource allocation, fault tolerance, scalability, and performance optimization. The algorithms themselves are just implementations of deeper principles about how distributed systems manage competing demands for finite resources.

Don't just memorize which algorithm does what. Instead, focus on why each approach exists: What problem does it solve? What assumptions does it make about the environment? When would it fail? Understanding the trade-offs between simplicity and adaptability, static and dynamic allocation, and stateless and stateful routing will serve you far better on exams than rote recall. Each algorithm represents a different answer to the fundamental question: How do we distribute work fairly and efficiently?

Static Distribution Algorithms

These algorithms make routing decisions without considering current server state. They're predictable, low-overhead, and work well when server capabilities and request costs are uniform. The key trade-off: simplicity comes at the cost of responsiveness to changing conditions.

Round Robin

Sequential distribution—requests cycle through servers in fixed order (Server 1 → Server 2 → Server 3 → repeat)
Zero state tracking required, making implementation trivial and overhead minimal
Assumes homogeneous servers with equal processing power; breaks down when servers differ in capability

Weighted Round Robin

Capacity-aware distribution—assigns weights (e.g., Server A: weight 3, Server B: weight 1) so stronger servers receive proportionally more requests
Static weights must be configured manually based on known server specifications
Handles heterogeneous environments where servers have different CPU, memory, or network capabilities

Random

Probabilistic distribution—each request routed to a randomly selected server with no memory of previous decisions
Stateless simplicity means no coordination overhead, useful in highly distributed environments
Statistical fairness only—individual servers may experience load spikes; law of large numbers provides balance over time

Compare: Round Robin vs. Random—both are stateless and simple, but Round Robin guarantees even distribution while Random only achieves it probabilistically. For exam questions about deterministic vs. probabilistic load balancing, this is your key distinction.

Connection-Aware Algorithms

These algorithms track active connections to make smarter routing decisions. They respond to actual load rather than assumed load, making them more adaptive but requiring state maintenance.

Least Connection

Routes to the server with fewest active connections—directly measures current load rather than assuming uniform request distribution
Requires connection tracking, adding overhead but enabling responsiveness to variable request durations
Ideal for long-lived connections (WebSockets, database queries) where request completion times vary significantly

Weighted Least Connection

Combines connection count with server capacity—calculates $\frac{\text{active connections}}{\text{server weight}}$ to find the relatively least-loaded server
Handles heterogeneous clusters where a powerful server with 100 connections may be less loaded than a weak server with 50
Best of both worlds for production environments with mixed hardware generations

Compare: Least Connection vs. Weighted Least Connection—both track active connections, but the weighted version accounts for server heterogeneity. If an FRQ asks about load balancing in a mixed-capacity cluster, Weighted Least Connection is your answer.

Performance-Based Algorithms

These algorithms measure actual server performance metrics to make routing decisions. They're the most responsive but require continuous monitoring infrastructure. The principle: let observed behavior, not assumptions, drive decisions.

Least Response Time

Routes to the fastest-responding server—measures actual latency, not just connection count
Requires real-time monitoring of response times, adding infrastructure complexity
Directly optimizes user experience by minimizing perceived latency; accounts for factors connection count misses (CPU load, disk I/O)

Least Bandwidth

Routes to the server consuming least network bandwidth—critical when network capacity is the bottleneck
Targets network resource optimization rather than CPU or connection metrics
Essential for bandwidth-intensive applications like video streaming or large file transfers where network saturation is the primary concern

Compare: Least Response Time vs. Least Bandwidth—both measure real-time metrics, but they optimize for different bottlenecks. Response time targets latency-sensitive applications; bandwidth targets throughput-sensitive applications. Know which metric matters for which workload type.

Session-Persistent Algorithms

Some applications require that a client consistently reaches the same server—for session data, caching efficiency, or stateful protocols. These algorithms sacrifice load optimization for affinity.

IP Hash

Deterministic routing via hash function—applies hash to client IP (e.g., $\text{server} = \text{hash(IP)} \mod n$ ) for consistent server assignment
Enables session persistence without storing session state in a shared database or passing session tokens
Breaks on IP changes (mobile users, NAT) and can cause imbalance if client IPs aren't uniformly distributed

Compare: IP Hash vs. Least Connection—IP Hash prioritizes consistency (same client always hits same server), while Least Connection prioritizes balance (requests go where load is lowest). This trade-off between affinity and fairness is a classic distributed systems question.

Adaptive Algorithms

These algorithms adjust their behavior based on observed conditions, representing the most sophisticated approach to load balancing. They treat load balancing as a continuous optimization problem rather than a fixed policy.

Dynamic Load Balancing

Real-time adjustment of routing decisions based on current server loads, connection counts, and response times
Responds to changing conditions—handles traffic spikes, server failures, and variable request costs automatically
Higher overhead from continuous monitoring and decision recalculation; trade-off between responsiveness and computational cost

Adaptive Load Balancing

Learning-based optimization—uses historical performance data to predict optimal routing, not just react to current state
Machine learning integration enables pattern recognition (e.g., anticipating lunch-hour traffic spikes)
Continuous improvement as the system accumulates data; represents the cutting edge of production load balancing

Compare: Dynamic vs. Adaptive Load Balancing—Dynamic reacts to current conditions; Adaptive predicts future conditions based on learned patterns. Both are real-time, but Adaptive adds a predictive layer. For questions about proactive vs. reactive resource management, this distinction matters.

Quick Reference Table

Concept	Best Examples
Stateless/Simple	Round Robin, Random
Capacity-Aware	Weighted Round Robin, Weighted Least Connection
Connection-Tracking	Least Connection, Weighted Least Connection
Performance-Metric Based	Least Response Time, Least Bandwidth
Session Persistence	IP Hash
Real-Time Adaptation	Dynamic Load Balancing, Adaptive Load Balancing
Homogeneous Servers	Round Robin, Least Connection, Random
Heterogeneous Servers	Weighted Round Robin, Weighted Least Connection

Self-Check Questions

Which two algorithms both use server weights, and what additional factor does one consider that the other ignores?
You're designing a load balancer for a video streaming service where network bandwidth is limited but servers have similar specs. Which algorithm would you choose and why?
Compare and contrast IP Hash and Least Connection in terms of their trade-offs between session persistence and load optimization. When would you sacrifice one for the other?
A system administrator notices that Round Robin is causing some servers to become overloaded while others sit idle. What assumptions has Round Robin made that aren't holding true, and which algorithm would you recommend instead?
If an FRQ asks you to design a load balancing strategy for a cloud application with unpredictable traffic patterns and servers of varying capabilities, which algorithm category should you focus on and what monitoring infrastructure would it require?

💻Parallel and Distributed Computing

Load Balancing Algorithms

Why This Matters

Static Distribution Algorithms

Round Robin

Weighted Round Robin

Random

Connection-Aware Algorithms

Least Connection

Weighted Least Connection

Performance-Based Algorithms

Least Response Time

Least Bandwidth

Session-Persistent Algorithms

IP Hash

Adaptive Algorithms

Dynamic Load Balancing

Adaptive Load Balancing

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes