TL;DR
Load balancing distributes network traffic or computational workload across multiple servers using algorithms like round-robin, least-connections, or consistent hashing to prevent any single server from being overwhelmed. Essential for scalability, high availability, and optimized resource utilization in systems like AWS ELB, nginx, and HAProxy.
Visual Overview
WITHOUT LOAD BALANCING (Single Server) ┌────────────────────────────────────────────────┐ │ All traffic → Single Server │ │ │ │ 100 req/s → ┌──────────┐ │ │ │ Server │ │ │ │ Overload!│ │ │ └──────────┘ │ │ │ │ Problems: │ │ - Single point of failure ✕ │ │ - Limited capacity ✕ │ │ - High latency under load ✕ │ │ - No redundancy ✕ │ └────────────────────────────────────────────────┘ WITH LOAD BALANCING (Distributed) ┌────────────────────────────────────────────────┐ │ Load Balancer │ │ ┌─────────────┐ │ │ 100 req │ Nginx/ │ │ │ /s → │ ELB/ │ │ │ │ HAProxy │ │ │ └─────────────┘ │ │ ↓ │ │ ┌────────┼────────┐ │ │ ↓ ↓ ↓ │ │ ┌───────┐┌───────┐┌───────┐ │ │ │Server1││Server2││Server3│ │ │ │33 req/││33 req/││33 req/│ │ │ │ s ││ s ││ s │ │ │ └───────┘└───────┘└───────┘ │ │ │ │ Benefits: │ │ ✓ High availability (failover) │ │ ✓ Horizontal scalability (add servers) │ │ ✓ Better resource utilization │ │ ✓ Health checks & auto-routing │ └────────────────────────────────────────────────┘ LOAD BALANCING ALGORITHMS COMPARISON ┌────────────────────────────────────────────────┐ │ Round Robin (sequential distribution): │ │ Request 1 → Server 1 │ │ Request 2 → Server 2 │ │ Request 3 → Server 3 │ │ Request 4 → Server 1 (cycle repeats) │ │ │ │ Least Connections (dynamic balancing): │ │ Server 1: 5 active connections │ │ Server 2: 3 active connections ✓ (chosen) │ │ Server 3: 8 active connections │ │ → Route to server with fewest connections │ │ │ │ Consistent Hashing (sticky routing): │ │ hash(user_id) % num_servers │ │ User 123 → Server 2 (always same server) │ │ User 456 → Server 1 (always same server) │ │ → Same client always routes to same server │ └────────────────────────────────────────────────┘ LAYER 4 VS LAYER 7 LOAD BALANCING ┌────────────────────────────────────────────────┐ │ Layer 4 (Transport Layer - TCP/UDP): │ │ ┌──────────────┐ │ │ │ Client │ │ │ │ 1.2.3.4:5678 │ │ │ └──────────────┘ │ │ ↓ │ │ Load balancer sees: IP + Port │ │ Routes based on: TCP connection │ │ Cannot see: HTTP headers, URLs, cookies │ │ ↓ │ │ Backend server receives original connection │ │ │ │ + Faster (no HTTP parsing) │ │ + Lower latency (~1-2ms) │ │ - Limited routing logic │ │ │ │ Layer 7 (Application Layer - HTTP): │ │ ┌──────────────┐ │ │ │ Client │ │ │ │ GET /api/users│ │ │ │ Cookie: xyz │ │ │ └──────────────┘ │ │ ↓ │ │ Load balancer sees: Full HTTP request │ │ Routes based on: URL, headers, cookies │ │ ↓ │ │ /api/users → Backend Pool A │ │ /static/* → Backend Pool B (CDN) │ │ │ │ + Advanced routing (path, host, cookie) │ │ + SSL termination │ │ - Slower (HTTP parsing, ~5-10ms) │ └────────────────────────────────────────────────┘
Core Explanation
What is Load Balancing?
Load balancing is the process of distributing incoming requests across multiple backend servers to:
- Optimize resource utilization: No server is overloaded while others are idle
- Maximize throughput: Handle more requests by adding servers
- Minimize latency: Route to least-loaded or nearest server
- Ensure high availability: Route around failed servers
Load Balancing Algorithms
1. Round Robin
Simple sequential distribution: Incoming requests: Backend servers: Request 1 ──────────→ Server 1 Request 2 ──────────→ Server 2 Request 3 ──────────→ Server 3 Request 4 ──────────→ Server 1 (cycle repeats) Pros: ✓ Simple implementation ✓ Even distribution (if all requests equal) ✓ Stateless (no tracking needed) Cons: ✗ Doesn't account for server capacity ✗ Doesn't account for request complexity ✗ Long-running requests can overload one server Use case: Stateless microservices with uniform requests
2. Weighted Round Robin
Distribute based on server capacity: Backend servers with weights: Server 1 (weight=5): More powerful Server 2 (weight=3): Medium capacity Server 3 (weight=2): Less powerful Distribution pattern: 5 requests → Server 1 3 requests → Server 2 2 requests → Server 3 (repeat) Pros: ✓ Accounts for heterogeneous server capacity ✓ Efficient resource utilization Cons: ✗ Still doesn't account for dynamic load ✗ Requires manual weight configuration Use case: Mixed hardware (different CPU/RAM capacities)
3. Least Connections
Route to server with fewest active connections: Real-time server state: Server 1: 25 active connections Server 2: 15 active connections ✓ (chosen) Server 3: 30 active connections New request → Server 2 (fewest connections) Pros: ✓ Dynamic load balancing ✓ Accounts for long-running connections ✓ Better for variable request durations Cons: ✗ Requires tracking connection state ✗ More complex implementation Use case: HTTP/1.1 keep-alive, websockets, long-polling
4. Weighted Least Connections
Combines least connections with server weights: Formula: connections / weight Server 1: 20 connections, weight=5 → score = 4.0 Server 2: 12 connections, weight=3 → score = 4.0 Server 3: 10 connections, weight=2 → score = 5.0 ✓ (highest) Route to Server 1 or 2 (lowest score) Pros: ✓ Best of both worlds (capacity + dynamic load) Use case: Production systems with mixed hardware
5. Least Response Time
Route to server with fastest response time: Recent response times (moving average): Server 1: 50ms average Server 2: 30ms average ✓ (chosen) Server 3: 100ms average Pros: ✓ Optimizes user experience ✓ Automatically adapts to server performance ✓ Accounts for network latency Cons: ✗ Requires active health checks ✗ Can amplify cascading failures Use case: Geo-distributed deployments
6. IP Hash (Consistent Hashing)
Hash client IP to deterministically select server: hash(client_ip) % num_servers Client 1.2.3.4 → hash % 3 = 1 → Server 1 (always) Client 5.6.7.8 → hash % 3 = 2 → Server 2 (always) Client 9.10.11.12 → hash % 3 = 0 → Server 3 (always) Pros: ✓ Session persistence (same client → same server) ✓ Useful for caching (server caches client data) ✓ No shared session storage needed Cons: ✗ Uneven distribution if client IPs clustered ✗ Server addition/removal disrupts assignments Use case: Stateful applications with server-side sessions
7. Least Bandwidth
Route to server currently serving least bandwidth: Server 1: 500 Mbps Server 2: 300 Mbps ✓ (chosen) Server 3: 700 Mbps Use case: Video streaming, large file downloads
Layer 4 vs Layer 7 Load Balancing
Layer 4 (Transport Layer)
OSI Layer: Transport (TCP/UDP) ┌────────────────────────────────────────┐ │ What it sees: │ │ - Source IP + Port │ │ - Destination IP + Port │ │ - TCP/UDP protocol │ │ │ │ What it CAN'T see: │ │ - HTTP headers │ │ - URLs, query parameters │ │ - Cookies │ │ - Request body │ │ │ │ Routing decisions based on: │ │ - IP address │ │ - Port number │ │ - Protocol (TCP vs UDP) │ └────────────────────────────────────────┘ Example: AWS Network Load Balancer (NLB) Pros: ✓ Very fast (< 1ms latency) ✓ High throughput (millions of requests/sec) ✓ Low CPU usage ✓ Supports any TCP/UDP protocol ✓ Preserves client IP (pass-through) Cons: ✗ No content-based routing ✗ No SSL termination ✗ Limited health checks Use case: TCP-based services, ultra-low latency requirements
Layer 7 (Application Layer)
OSI Layer: Application (HTTP/HTTPS) ┌────────────────────────────────────────┐ │ What it sees: │ │ - Full HTTP request │ │ - Headers (User-Agent, Host, etc.) │ │ - URL path and query parameters │ │ - Cookies │ │ - Request body │ │ │ │ Routing decisions based on: │ │ - URL path: /api/* → API servers │ │ - Host header: api.example.com │ │ - Cookie: user_id=123 │ │ - HTTP method: POST vs GET │ │ - Custom headers │ └────────────────────────────────────────┘ Example: AWS Application Load Balancer (ALB), nginx Pros: ✓ Content-based routing (path, host, headers) ✓ SSL/TLS termination (decrypt at LB) ✓ Advanced health checks (HTTP status codes) ✓ Request/response manipulation ✓ Web Application Firewall (WAF) integration Cons: ✗ Slower (5-10ms latency due to HTTP parsing) ✗ Higher CPU usage ✗ More complex configuration Use case: HTTP microservices, API gateways, web applications
Health Checks & Failover
Health Check Mechanisms: 1. Active Health Checks: ┌────────────────────────────────────────┐ │ Load Balancer → Backend Server │ │ GET /health every 10 seconds │ │ ↓ │ │ Server responds: 200 OK ✓ │ │ or │ │ Server timeout/error → Mark unhealthy │ └────────────────────────────────────────┘ Configuration: - Interval: 10s (how often to check) - Timeout: 5s (max wait for response) - Unhealthy threshold: 3 (failures before marking down) - Healthy threshold: 2 (successes before marking up) 2. Passive Health Checks: ┌────────────────────────────────────────┐ │ Monitor real traffic: │ │ Server returns 5xx errors → Unhealthy │ │ Server timeout → Unhealthy │ │ Server 2xx responses → Healthy │ └────────────────────────────────────────┘ Failover Flow: ┌────────────────────────────────────────┐ │ 1. Server 2 fails health check │ │ 2. Load balancer marks Server 2 DOWN │ │ 3. New requests → Server 1 & 3 only │ │ 4. Server 2 recovers │ │ 5. Passes health checks (2x) │ │ 6. Load balancer marks Server 2 UP │ │ 7. Resume sending traffic to Server 2 │ └────────────────────────────────────────┘
Session Persistence (Sticky Sessions)
Problem: User session stored on specific server Without sticky sessions: ┌────────────────────────────────────────┐ │ Request 1: Login → Server 1 (session) │ │ Request 2: Get data → Server 2 ✗ │ │ (Server 2 doesn't have session) │ └────────────────────────────────────────┘ Solution 1: Cookie-based sticky sessions: ┌────────────────────────────────────────┐ │ Request 1: Login → Server 1 │ │ Response: Set-Cookie: server=1 │ │ Request 2: Cookie: server=1 → Server 1 │ │ (LB reads cookie, routes to Server 1) │ └────────────────────────────────────────┘ Solution 2: IP hash sticky sessions: ┌────────────────────────────────────────┐ │ hash(client_ip) always → same server │ │ Client 1.2.3.4 → Server 1 (always) │ └────────────────────────────────────────┘ Solution 3: Session replication (better): ┌────────────────────────────────────────┐ │ Store sessions in Redis/Memcached │ │ Any server can access session │ │ No sticky sessions needed ✓ │ └────────────────────────────────────────┘
Real Systems Using Load Balancing
| System | Type | Algorithms | Key Features | Use Case |
|---|---|---|---|---|
| AWS ELB (ALB) | Layer 7 | Round robin, least outstanding requests | Content-based routing, SSL termination | HTTP microservices |
| AWS NLB | Layer 4 | Flow hash | Ultra-low latency, static IP | TCP services, high throughput |
| nginx | Layer 7 | Round robin, least_conn, ip_hash | Open source, highly configurable | Web servers, API gateway |
| HAProxy | Layer 4/7 | Weighted RR, least_conn, consistent hash | High performance, advanced ACLs | Enterprise load balancing |
| Envoy | Layer 7 | Weighted RR, least_request, ring_hash | Service mesh, observability | Kubernetes, microservices |
| Cloudflare | Layer 7 | Geo-routing, weighted pools | DDoS protection, CDN | Global load balancing |
Case Study: AWS Application Load Balancer
┌──────────────────────────────────────────────┐ │ Internet │ │ ↓ │ │ ALB (multi-AZ for high availability) │ │ ├─ Availability Zone 1 │ │ └─ Availability Zone 2 │ │ ↓ │ │ Target Groups: │ │ ├─ API Servers (port 3000) │ │ │ └─ /api/_ → API target group │ │ ├─ Web Servers (port 80) │ │ │ └─ /_ → Web target group │ │ └─ Admin Servers (port 8080) │ │ └─ /admin/* → Admin target group │ └──────────────────────────────────────────────┘ Routing Rules: 1. Path-based: /api/* → API servers 2. Host-based: admin.example.com → Admin servers 3. Header-based: X-API-Version: v2 → V2 servers Health Checks: - Protocol: HTTP - Path: /health - Interval: 30s - Timeout: 5s - Healthy threshold: 5 - Unhealthy threshold: 2 Features: ✓ SSL/TLS termination (offload from servers) ✓ WebSocket support ✓ HTTP/2 support ✓ Integration with Auto Scaling ✓ CloudWatch metrics
Case Study: nginx Load Balancer
# nginx.conf - Load Balancer Configuration
# Define upstream backend servers
upstream backend {
# Load balancing algorithm
least_conn; # Use least connections
# Backend servers with weights
server backend1.example.com:8080 weight=5;
server backend2.example.com:8080 weight=3;
server backend3.example.com:8080 weight=2;
# Server with max connections limit
server backend4.example.com:8080 max_conns=100;
# Backup server (used only if others fail)
server backup.example.com:8080 backup;
# Health check configuration
keepalive 32; # Keep 32 connections alive
}
// ... omitted: keep concept snippets short
server_name example.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
# SSL termination (decrypt here, forward HTTP to backend)
location / {
proxy_pass http://backend;
}
}
When to Use Load Balancing
✓ Perfect Use Cases
| Use Case | Scenario | Solution | Detail | Benefit |
|---|---|---|---|---|
| High Traffic Web Applications | E-commerce site with millions of users | Layer 7 ALB with 50 backend servers | Requirements: handle 100,000 requests/second | Horizontal scalability, failover, health checks |
| Microservices Architecture | 100+ microservices communicating | Service mesh (Envoy/Linkerd) with load balancing per service | — | Automatic service discovery, circuit breaking, observability |
| Global Applications (Geo-LB) | Users worldwide accessing application | DNS-based load balancing (Route53, Cloudflare) | Route: US users → US region, EU users → EU region | Low latency, disaster recovery |
| Database Read Replicas | Read-heavy application with MySQL replicas | Load balancer distributing reads across 5 replicas | Algorithm: least connections (account for query duration) | Scale read throughput |
✕ When NOT to Use (or Use Carefully)
| Anti-Pattern | Problem | Alternative / Solution | Example |
|---|---|---|---|
| Single Server Deployment | Adds complexity and latency for no benefit | Direct connection to server | Development environment, small apps |
| Stateful TCP Connections (Without Sticky Sessions) | Connection state lost on failover | Use connection pooling or client-side retry logic | Database connections, SSH sessions |
| Very Low Latency Requirements (< 1ms) | Load balancer adds latency (1-10ms) | Client-side load balancing (gRPC, Thrift) | High-frequency trading, real-time gaming |
Interview Application
Common Interview Question
Q: “Design a load balancing solution for a REST API with 10 backend servers. How would you ensure high availability and optimal performance?”
Strong Answer:
“I’d design a multi-layered load balancing solution:
Architecture:
I would put DNS-based routing in front to send users to the nearest datacenter, then use a Layer 7 load balancer such as ALB or nginx for HTTP routing. If the workload has raw TCP services or very low latency requirements, I would add a Layer 4 balancer for that path instead of forcing everything through HTTP-aware routing.
Algorithm Selection:
I would use least-connections for API endpoints because request duration varies and long-running calls should not pile up on one server. Static assets can use round robin because requests are uniform and fast. If the application still stores session state server-side, I would use cookie-based or IP-hash affinity, while calling out that the better fix is to move session state out of the backend instance.
High Availability:
Availability comes from active and passive health signals. The balancer should probe GET /health about every 10 seconds, watch real 5xx traffic, and mark a server unhealthy after a small failure threshold such as three misses. Once a server is unhealthy, it leaves the pool, traffic moves to healthy instances, and retries are bounded by circuit-breaker behavior. I would spread both balancers and servers across three availability zones so one zone loss does not remove the service.
Performance Optimizations:
For performance, I would terminate TLS at the balancer to remove CPU work from backends, keep pooled connections open to reduce handshake overhead, and cache static responses where that does not weaken correctness. Those optimizations help only if the health and retry behavior stays bounded; otherwise they can hide overload until it becomes an incident.
Monitoring:
The dashboard needs request rate, error rate, p50 and p99 latency, health-check state, and traffic distribution per server. Alerts should fire on rising 5xxs, high p99, repeated health-check failures, and skewed distribution because those map directly to user pain and capacity imbalance.
Scaling:
Scaling should be boring: add servers when sustained CPU or queue depth crosses the target, auto-register new instances with the balancer, and drain connections before removing a server. The graceful shutdown path matters as much as scale-out because bad draining turns deploys into user-visible errors.
Trade-offs:
The trade-off is latency versus routing intelligence. Layer 7 may add 5-10ms compared with roughly 1ms for Layer 4, but it gives path routing, host routing, cookie behavior, and TLS termination. For ultra-low latency, I would use Layer 4 or client-side load balancing.”
Code Example
Simple Round Robin Load Balancer
import requests
import time
from typing import List
from dataclasses import dataclass
import threading
@dataclass
class BackendServer:
"""Represents a backend server"""
host: str
port: int
weight: int = 1
healthy: bool = True
active_connections: int = 0
class LoadBalancer:
"""
Simple load balancer implementing multiple algorithms
"""
def __init__(self, servers: List[BackendServer]):
self.servers = servers
self.current_index = 0 # For round robin
# ... omitted: keep concept snippets short
result = lb.forward_request('/api/users', algorithm='ip_hash',
client_ip=ip)
print(f"Client {ip} → {result['server']}")
except Exception as e:
print(f"Request from {ip} failed: {e}")
# Get status
print("\n=== Load Balancer Status ===")
import json
print(json.dumps(lb.get_status(), indent=2))
Layer 7 HTTP Load Balancer with Path Routing
from flask import Flask, request, Response
import requests
app = Flask(__name__)
# Define backend pools
BACKEND_POOLS = {
'api': [
'http://api1.example.com:3000',
'http://api2.example.com:3000',
'http://api3.example.com:3000',
],
'web': [
'http://web1.example.com:80',
'http://web2.example.com:80',
],
'admin': [
'http://admin1.example.com:8080',
]
}
# Round robin counters
# ... omitted: keep concept snippets short
response.content,
status=response.status_code,
headers=dict(response.headers)
)
except requests.RequestException as e:
return Response(f"Bad Gateway: {e}", status=502)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
Related Content
Prerequisites:
- Distributed Systems Basics - Foundation concepts
Related Concepts:
- Sharding - Data-level load distribution
- Topic Partitioning - Message streaming load distribution
- Consumer Groups - Consumer-level load balancing
Used In Systems:
- AWS ELB/ALB/NLB: Cloud load balancing
- nginx/HAProxy: Open-source load balancers
- Kubernetes: Service load balancing with kube-proxy
Explained In Detail:
- System Design Deep Dive - Load balancing in production systems
Quick Self-Check
- Can explain load balancing in 60 seconds?
- Know difference between Layer 4 and Layer 7 load balancing?
- Understand 3+ load balancing algorithms and their trade-offs?
- Can explain health checks and failover mechanisms?
- Know when to use sticky sessions vs session replication?
- Can design a load balancing solution for given requirements?
Production signal