Skip to content

Load Balancing

8 min Intermediate Patterns Interview: 80%

Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck

πŸ’Ό 80% of system design interviews
Interview Relevance
80% of system design interviews
🏭 AWS ELB, nginx, HAProxy
Production Impact
Powers systems at AWS ELB, nginx, HAProxy
⚑ Optimal response times
Performance
Optimal response times query improvement
πŸ“ˆ Horizontal scaling
Scalability
Horizontal scaling

TL;DR

Load balancing distributes network traffic or computational workload across multiple servers using algorithms like round-robin, least-connections, or consistent hashing to prevent any single server from being overwhelmed. Essential for scalability, high availability, and optimized resource utilization in systems like AWS ELB, nginx, and HAProxy.

Visual Overview

WITHOUT LOAD BALANCING (Single Server)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  All traffic β†’ Single Server                   β”‚
β”‚                                                β”‚
β”‚  100 req/s β†’ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚              β”‚  Server  β”‚                     β”‚
β”‚              β”‚ Overload!β”‚                     β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚                                                β”‚
β”‚  Problems:                                     β”‚
β”‚  - Single point of failure βœ•                  β”‚
β”‚  - Limited capacity βœ•                         β”‚
β”‚  - High latency under load βœ•                  β”‚
β”‚  - No redundancy βœ•                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

WITH LOAD BALANCING (Distributed)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Load Balancer                          β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚ 100 req β”‚   Nginx/    β”‚                        β”‚
β”‚   /s β†’ β”‚   ELB/      β”‚                        β”‚
β”‚         β”‚  HAProxy    β”‚                        β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚              ↓                                 β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚     ↓        ↓        ↓                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚Server1β”‚β”‚Server2β”‚β”‚Server3β”‚                 β”‚
β”‚  β”‚33 req/β”‚β”‚33 req/β”‚β”‚33 req/β”‚                 β”‚
β”‚  β”‚  s    β”‚β”‚  s    β”‚β”‚  s    β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                                β”‚
β”‚  Benefits:                                     β”‚
β”‚  βœ“ High availability (failover)                β”‚
β”‚  βœ“ Horizontal scalability (add servers)        β”‚
β”‚  βœ“ Better resource utilization                 β”‚
β”‚  βœ“ Health checks & auto-routing                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LOAD BALANCING ALGORITHMS COMPARISON
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Round Robin (sequential distribution):        β”‚
β”‚  Request 1 β†’ Server 1                          β”‚
β”‚  Request 2 β†’ Server 2                          β”‚
β”‚  Request 3 β†’ Server 3                          β”‚
β”‚  Request 4 β†’ Server 1  (cycle repeats)         β”‚
β”‚                                                β”‚
β”‚  Least Connections (dynamic balancing):        β”‚
β”‚  Server 1: 5 active connections                β”‚
β”‚  Server 2: 3 active connections βœ“ (chosen)     β”‚
β”‚  Server 3: 8 active connections                β”‚
β”‚  β†’ Route to server with fewest connections     β”‚
β”‚                                                β”‚
β”‚  Consistent Hashing (sticky routing):          β”‚
β”‚  hash(user_id) % num_servers                   β”‚
β”‚  User 123 β†’ Server 2 (always same server)      β”‚
β”‚  User 456 β†’ Server 1 (always same server)      β”‚
β”‚  β†’ Same client always routes to same server    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LAYER 4 VS LAYER 7 LOAD BALANCING
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 4 (Transport Layer - TCP/UDP):          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
β”‚  β”‚  Client      β”‚                              β”‚
β”‚  β”‚ 1.2.3.4:5678 β”‚                              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
β”‚        ↓                                       β”‚
β”‚  Load balancer sees: IP + Port                 β”‚
β”‚  Routes based on: TCP connection               β”‚
β”‚  Cannot see: HTTP headers, URLs, cookies       β”‚
β”‚        ↓                                       β”‚
β”‚  Backend server receives original connection   β”‚
β”‚                                                β”‚
β”‚  + Faster (no HTTP parsing)                    β”‚
β”‚  + Lower latency (~1-2ms)                      β”‚
β”‚  - Limited routing logic                       β”‚
β”‚                                                β”‚
β”‚  Layer 7 (Application Layer - HTTP):           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
β”‚  β”‚  Client      β”‚                              β”‚
β”‚  β”‚ GET /api/usersβ”‚                             β”‚
β”‚  β”‚ Cookie: xyz  β”‚                              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
β”‚        ↓                                       β”‚
β”‚  Load balancer sees: Full HTTP request         β”‚
β”‚  Routes based on: URL, headers, cookies        β”‚
β”‚        ↓                                       β”‚
β”‚  /api/users β†’ Backend Pool A                   β”‚
β”‚  /static/*  β†’ Backend Pool B (CDN)             β”‚
β”‚                                                β”‚
β”‚  + Advanced routing (path, host, cookie)       β”‚
β”‚  + SSL termination                             β”‚
β”‚  - Slower (HTTP parsing, ~5-10ms)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Explanation

What is Load Balancing?

Load balancing is the process of distributing incoming requests across multiple backend servers to:

  1. Optimize resource utilization: No server is overloaded while others are idle
  2. Maximize throughput: Handle more requests by adding servers
  3. Minimize latency: Route to least-loaded or nearest server
  4. Ensure high availability: Route around failed servers

Load Balancing Algorithms

1. Round Robin

Simple sequential distribution:

Incoming requests:     Backend servers:
Request 1 ──────────→ Server 1
Request 2 ──────────→ Server 2
Request 3 ──────────→ Server 3
Request 4 ──────────→ Server 1 (cycle repeats)

Pros:
βœ“ Simple implementation
βœ“ Even distribution (if all requests equal)
βœ“ Stateless (no tracking needed)

Cons:
βœ— Doesn't account for server capacity
βœ— Doesn't account for request complexity
βœ— Long-running requests can overload one server

Use case: Stateless microservices with uniform requests

2. Weighted Round Robin

Distribute based on server capacity:

Backend servers with weights:
Server 1 (weight=5): More powerful
Server 2 (weight=3): Medium capacity
Server 3 (weight=2): Less powerful

Distribution pattern:
5 requests β†’ Server 1
3 requests β†’ Server 2
2 requests β†’ Server 3
(repeat)

Pros:
βœ“ Accounts for heterogeneous server capacity
βœ“ Efficient resource utilization

Cons:
βœ— Still doesn't account for dynamic load
βœ— Requires manual weight configuration

Use case: Mixed hardware (different CPU/RAM capacities)

3. Least Connections

Route to server with fewest active connections:

Real-time server state:
Server 1: 25 active connections
Server 2: 15 active connections βœ“ (chosen)
Server 3: 30 active connections

New request β†’ Server 2 (fewest connections)

Pros:
βœ“ Dynamic load balancing
βœ“ Accounts for long-running connections
βœ“ Better for variable request durations

Cons:
βœ— Requires tracking connection state
βœ— More complex implementation

Use case: HTTP/1.1 keep-alive, websockets, long-polling

4. Weighted Least Connections

Combines least connections with server weights:

Formula: connections / weight

Server 1: 20 connections, weight=5 β†’ score = 4.0
Server 2: 12 connections, weight=3 β†’ score = 4.0
Server 3: 10 connections, weight=2 β†’ score = 5.0 βœ“ (highest)

Route to Server 1 or 2 (lowest score)

Pros:
βœ“ Best of both worlds (capacity + dynamic load)

Use case: Production systems with mixed hardware

5. Least Response Time

Route to server with fastest response time:

Recent response times (moving average):
Server 1: 50ms average
Server 2: 30ms average βœ“ (chosen)
Server 3: 100ms average

Pros:
βœ“ Optimizes user experience
βœ“ Automatically adapts to server performance
βœ“ Accounts for network latency

Cons:
βœ— Requires active health checks
βœ— Can amplify cascading failures

Use case: Geo-distributed deployments

6. IP Hash (Consistent Hashing)

Hash client IP to deterministically select server:

hash(client_ip) % num_servers

Client 1.2.3.4   β†’ hash % 3 = 1 β†’ Server 1 (always)
Client 5.6.7.8   β†’ hash % 3 = 2 β†’ Server 2 (always)
Client 9.10.11.12 β†’ hash % 3 = 0 β†’ Server 3 (always)

Pros:
βœ“ Session persistence (same client β†’ same server)
βœ“ Useful for caching (server caches client data)
βœ“ No shared session storage needed

Cons:
βœ— Uneven distribution if client IPs clustered
βœ— Server addition/removal disrupts assignments

Use case: Stateful applications with server-side sessions

7. Least Bandwidth

Route to server currently serving least bandwidth:

Server 1: 500 Mbps
Server 2: 300 Mbps βœ“ (chosen)
Server 3: 700 Mbps

Use case: Video streaming, large file downloads

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer)

OSI Layer: Transport (TCP/UDP)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  What it sees:                         β”‚
β”‚  - Source IP + Port                    β”‚
β”‚  - Destination IP + Port               β”‚
β”‚  - TCP/UDP protocol                    β”‚
β”‚                                        β”‚
β”‚  What it CAN'T see:                    β”‚
β”‚  - HTTP headers                        β”‚
β”‚  - URLs, query parameters              β”‚
β”‚  - Cookies                             β”‚
β”‚  - Request body                        β”‚
β”‚                                        β”‚
β”‚  Routing decisions based on:           β”‚
β”‚  - IP address                          β”‚
β”‚  - Port number                         β”‚
β”‚  - Protocol (TCP vs UDP)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: AWS Network Load Balancer (NLB)

Pros:
βœ“ Very fast (< 1ms latency)
βœ“ High throughput (millions of requests/sec)
βœ“ Low CPU usage
βœ“ Supports any TCP/UDP protocol
βœ“ Preserves client IP (pass-through)

Cons:
βœ— No content-based routing
βœ— No SSL termination
βœ— Limited health checks

Use case: TCP-based services, ultra-low latency requirements

Layer 7 (Application Layer)

OSI Layer: Application (HTTP/HTTPS)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  What it sees:                         β”‚
β”‚  - Full HTTP request                   β”‚
β”‚  - Headers (User-Agent, Host, etc.)    β”‚
β”‚  - URL path and query parameters       β”‚
β”‚  - Cookies                             β”‚
β”‚  - Request body                        β”‚
β”‚                                        β”‚
β”‚  Routing decisions based on:           β”‚
β”‚  - URL path: /api/* β†’ API servers      β”‚
β”‚  - Host header: api.example.com        β”‚
β”‚  - Cookie: user_id=123                 β”‚
β”‚  - HTTP method: POST vs GET            β”‚
β”‚  - Custom headers                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: AWS Application Load Balancer (ALB), nginx

Pros:
βœ“ Content-based routing (path, host, headers)
βœ“ SSL/TLS termination (decrypt at LB)
βœ“ Advanced health checks (HTTP status codes)
βœ“ Request/response manipulation
βœ“ Web Application Firewall (WAF) integration

Cons:
βœ— Slower (5-10ms latency due to HTTP parsing)
βœ— Higher CPU usage
βœ— More complex configuration

Use case: HTTP microservices, API gateways, web applications

Health Checks & Failover

Health Check Mechanisms:

1. Active Health Checks:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Load Balancer β†’ Backend Server        β”‚
β”‚  GET /health every 10 seconds          β”‚
β”‚  ↓                                     β”‚
β”‚  Server responds: 200 OK βœ“             β”‚
β”‚  or                                    β”‚
β”‚  Server timeout/error β†’ Mark unhealthy β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration:
- Interval: 10s (how often to check)
- Timeout: 5s (max wait for response)
- Unhealthy threshold: 3 (failures before marking down)
- Healthy threshold: 2 (successes before marking up)

2. Passive Health Checks:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Monitor real traffic:                 β”‚
β”‚  Server returns 5xx errors β†’ Unhealthy β”‚
β”‚  Server timeout β†’ Unhealthy            β”‚
β”‚  Server 2xx responses β†’ Healthy        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Failover Flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Server 2 fails health check        β”‚
β”‚  2. Load balancer marks Server 2 DOWN  β”‚
β”‚  3. New requests β†’ Server 1 & 3 only   β”‚
β”‚  4. Server 2 recovers                  β”‚
β”‚  5. Passes health checks (2x)          β”‚
β”‚  6. Load balancer marks Server 2 UP    β”‚
β”‚  7. Resume sending traffic to Server 2 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Session Persistence (Sticky Sessions)

Problem: User session stored on specific server

Without sticky sessions:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Request 1: Login β†’ Server 1 (session) β”‚
β”‚  Request 2: Get data β†’ Server 2 βœ—      β”‚
β”‚  (Server 2 doesn't have session)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Solution 1: Cookie-based sticky sessions:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Request 1: Login β†’ Server 1           β”‚
β”‚  Response: Set-Cookie: server=1        β”‚
β”‚  Request 2: Cookie: server=1 β†’ Server 1β”‚
β”‚  (LB reads cookie, routes to Server 1) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Solution 2: IP hash sticky sessions:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  hash(client_ip) always β†’ same server  β”‚
β”‚  Client 1.2.3.4 β†’ Server 1 (always)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Solution 3: Session replication (better):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Store sessions in Redis/Memcached     β”‚
β”‚  Any server can access session         β”‚
β”‚  No sticky sessions needed βœ“           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Real Systems Using Load Balancing

SystemTypeAlgorithmsKey FeaturesUse Case
AWS ELB (ALB)Layer 7Round robin, least outstanding requestsContent-based routing, SSL terminationHTTP microservices
AWS NLBLayer 4Flow hashUltra-low latency, static IPTCP services, high throughput
nginxLayer 7Round robin, least_conn, ip_hashOpen source, highly configurableWeb servers, API gateway
HAProxyLayer 4/7Weighted RR, least_conn, consistent hashHigh performance, advanced ACLsEnterprise load balancing
EnvoyLayer 7Weighted RR, least_request, ring_hashService mesh, observabilityKubernetes, microservices
CloudflareLayer 7Geo-routing, weighted poolsDDoS protection, CDNGlobal load balancing

Case Study: AWS Application Load Balancer

AWS ALB Architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Internet                                    β”‚
β”‚    ↓                                         β”‚
β”‚  ALB (multi-AZ for high availability)        β”‚
β”‚    β”œβ”€ Availability Zone 1                    β”‚
β”‚    └─ Availability Zone 2                    β”‚
β”‚         ↓                                    β”‚
β”‚  Target Groups:                              β”‚
β”‚    β”œβ”€ API Servers (port 3000)                β”‚
β”‚    β”‚   └─ /api/* β†’ API target group          β”‚
β”‚    β”œβ”€ Web Servers (port 80)                  β”‚
β”‚    β”‚   └─ /* β†’ Web target group               β”‚
β”‚    └─ Admin Servers (port 8080)              β”‚
β”‚        └─ /admin/* β†’ Admin target group       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing Rules:
1. Path-based: /api/* β†’ API servers
2. Host-based: admin.example.com β†’ Admin servers
3. Header-based: X-API-Version: v2 β†’ V2 servers

Health Checks:
- Protocol: HTTP
- Path: /health
- Interval: 30s
- Timeout: 5s
- Healthy threshold: 5
- Unhealthy threshold: 2

Features:
βœ“ SSL/TLS termination (offload from servers)
βœ“ WebSocket support
βœ“ HTTP/2 support
βœ“ Integration with Auto Scaling
βœ“ CloudWatch metrics

Case Study: nginx Load Balancer

# nginx.conf - Load Balancer Configuration

# Define upstream backend servers
upstream backend {
    # Load balancing algorithm
    least_conn;  # Use least connections

    # Backend servers with weights
    server backend1.example.com:8080 weight=5;
    server backend2.example.com:8080 weight=3;
    server backend3.example.com:8080 weight=2;

    # Server with max connections limit
    server backend4.example.com:8080 max_conns=100;

    # Backup server (used only if others fail)
    server backup.example.com:8080 backup;

    # Health check configuration
    keepalive 32;  # Keep 32 connections alive
}

# API servers upstream
upstream api_servers {
    # Consistent hashing based on client IP
    ip_hash;

    server api1.example.com:3000;
    server api2.example.com:3000;
    server api3.example.com:3000;
}

server {
    listen 80;
    server_name example.com;

    # Health check endpoint
    location /health {
        access_log off;
        return 200 "healthy\n";
    }

    # Route /api/* to API servers
    location /api/ {
        proxy_pass http://api_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;

        # Retry logic
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 3;
    }

    # Route all other traffic to backend
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Static files (no load balancing needed)
    location /static/ {
        root /var/www;
        expires 1d;
    }
}

# SSL/TLS configuration
server {
    listen 443 ssl http2;
    server_name example.com;

    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;

    # SSL termination (decrypt here, forward HTTP to backend)
    location / {
        proxy_pass http://backend;
    }
}

When to Use Load Balancing

βœ“ Perfect Use Cases

High Traffic Web Applications

Scenario: E-commerce site with millions of users
Requirements: Handle 100,000 requests/second
Solution: Layer 7 ALB with 50 backend servers
Benefit: Horizontal scalability, failover, health checks

Microservices Architecture

Scenario: 100+ microservices communicating
Solution: Service mesh (Envoy/Linkerd) with load balancing per service
Benefit: Automatic service discovery, circuit breaking, observability

Global Applications (Geo-Load Balancing)

Scenario: Users worldwide accessing application
Solution: DNS-based load balancing (Route53, Cloudflare)
Route: US users β†’ US region, EU users β†’ EU region
Benefit: Low latency, disaster recovery

Database Read Replicas

Scenario: Read-heavy application with MySQL replicas
Solution: Load balancer distributing reads across 5 replicas
Algorithm: Least connections (account for query duration)
Benefit: Scale read throughput

βœ• When NOT to Use (or Use Carefully)

Single Server Deployment

Problem: Adds complexity and latency for no benefit
Alternative: Direct connection to server
Example: Development environment, small apps

Stateful TCP Connections (Without Sticky Sessions)

Problem: Connection state lost on failover
Example: Database connections, SSH sessions
Solution: Use connection pooling or client-side retry logic

Very Low Latency Requirements (< 1ms)

Problem: Load balancer adds latency (1-10ms)
Alternative: Client-side load balancing (gRPC, Thrift)
Example: High-frequency trading, real-time gaming

Interview Application

Common Interview Question

Q: β€œDesign a load balancing solution for a REST API with 10 backend servers. How would you ensure high availability and optimal performance?”

Strong Answer:

β€œI’d design a multi-layered load balancing solution:

Architecture:

  • DNS Load Balancing: Route to nearest datacenter (geo-routing)
  • Layer 7 Load Balancer: AWS ALB or nginx (content-based routing)
  • Layer 4 Load Balancer: Optional NLB for TCP services

Algorithm Selection:

  • API Endpoints: Least connections algorithm
    • Why: API requests have variable duration
    • Long-running queries won’t overload single server
  • Static Assets: Round robin
    • Why: Uniform, fast requests
  • User Sessions: IP hash or cookie-based sticky sessions
    • Why: Session affinity if storing state server-side

High Availability:

  1. Health Checks:
    • Active: GET /health every 10s
    • Passive: Monitor 5xx errors in real traffic
    • Threshold: 3 failures β†’ mark unhealthy
  2. Automatic Failover:
    • Failed server removed from pool immediately
    • Traffic redistributed to healthy servers
    • Auto-retry on failure (circuit breaker pattern)
  3. Multi-AZ Deployment:
    • Load balancer across 3 availability zones
    • Servers distributed across zones
    • Tolerate entire AZ failure

Performance Optimizations:

  1. SSL/TLS Termination:
    • Decrypt at load balancer
    • Offload CPU from backend servers
    • Use HTTP between LB and backends
  2. Connection Pooling:
    • Keep-alive connections to backends
    • Reduce TCP handshake overhead
  3. Caching:
    • Cache static responses at LB
    • Reduce backend load

Monitoring:

  • Metrics: Request rate, error rate, latency (p50, p99)
  • Alerts: Health check failures, high latency, 5xx errors
  • Dashboard: Real-time traffic distribution per server

Scaling:

  • Auto Scaling Group: Add servers when CPU > 70%
  • Load balancer auto-registers new instances
  • Graceful shutdown: Drain connections before removing server

Trade-offs:

  • Layer 7 LB adds 5-10ms latency vs Layer 4 (~1ms)
  • But enables advanced routing and SSL termination
  • For ultra-low latency, use Layer 4 or client-side LB”

Code Example

Simple Round Robin Load Balancer

import requests
import time
from typing import List
from dataclasses import dataclass
import threading

@dataclass
class BackendServer:
    """Represents a backend server"""
    host: str
    port: int
    weight: int = 1
    healthy: bool = True
    active_connections: int = 0

class LoadBalancer:
    """
    Simple load balancer implementing multiple algorithms
    """
    def __init__(self, servers: List[BackendServer]):
        self.servers = servers
        self.current_index = 0  # For round robin
        self.lock = threading.Lock()

        # Start health check thread
        self.health_check_thread = threading.Thread(
            target=self._health_check_loop,
            daemon=True
        )
        self.health_check_thread.start()

    def round_robin(self) -> BackendServer:
        """Simple round robin algorithm"""
        with self.lock:
            # Filter healthy servers
            healthy_servers = [s for s in self.servers if s.healthy]

            if not healthy_servers:
                raise Exception("No healthy servers available")

            # Get next server in round-robin fashion
            server = healthy_servers[self.current_index % len(healthy_servers)]
            self.current_index += 1

            return server

    def weighted_round_robin(self) -> BackendServer:
        """Weighted round robin based on server capacity"""
        with self.lock:
            healthy_servers = [s for s in self.servers if s.healthy]

            if not healthy_servers:
                raise Exception("No healthy servers available")

            # Build weighted list (repeat servers based on weight)
            weighted_list = []
            for server in healthy_servers:
                weighted_list.extend([server] * server.weight)

            # Round robin through weighted list
            server = weighted_list[self.current_index % len(weighted_list)]
            self.current_index += 1

            return server

    def least_connections(self) -> BackendServer:
        """Route to server with fewest active connections"""
        with self.lock:
            healthy_servers = [s for s in self.servers if s.healthy]

            if not healthy_servers:
                raise Exception("No healthy servers available")

            # Find server with minimum connections
            server = min(healthy_servers, key=lambda s: s.active_connections)

            return server

    def weighted_least_connections(self) -> BackendServer:
        """Weighted least connections (connections / weight)"""
        with self.lock:
            healthy_servers = [s for s in self.servers if s.healthy]

            if not healthy_servers:
                raise Exception("No healthy servers available")

            # Find server with minimum connections/weight ratio
            server = min(healthy_servers,
                        key=lambda s: s.active_connections / s.weight)

            return server

    def ip_hash(self, client_ip: str) -> BackendServer:
        """Consistent hashing based on client IP"""
        with self.lock:
            healthy_servers = [s for s in self.servers if s.healthy]

            if not healthy_servers:
                raise Exception("No healthy servers available")

            # Hash client IP to select server
            hash_value = hash(client_ip)
            server_index = hash_value % len(healthy_servers)

            return healthy_servers[server_index]

    def forward_request(self, request_path: str, algorithm: str = 'round_robin',
                       client_ip: str = None) -> dict:
        """
        Forward request to backend server using specified algorithm
        """
        # Select server based on algorithm
        if algorithm == 'round_robin':
            server = self.round_robin()
        elif algorithm == 'weighted_round_robin':
            server = self.weighted_round_robin()
        elif algorithm == 'least_connections':
            server = self.least_connections()
        elif algorithm == 'weighted_least_connections':
            server = self.weighted_least_connections()
        elif algorithm == 'ip_hash':
            if not client_ip:
                raise ValueError("client_ip required for ip_hash algorithm")
            server = self.ip_hash(client_ip)
        else:
            raise ValueError(f"Unknown algorithm: {algorithm}")

        print(f"Routing to {server.host}:{server.port} "
              f"(connections: {server.active_connections})")

        # Increment connection count
        with self.lock:
            server.active_connections += 1

        try:
            # Forward request to backend
            url = f"http://{server.host}:{server.port}{request_path}"
            response = requests.get(url, timeout=5)

            return {
                'status': response.status_code,
                'body': response.text,
                'server': f"{server.host}:{server.port}"
            }

        except requests.RequestException as e:
            print(f"Error forwarding to {server.host}:{server.port}: {e}")
            # Mark server as unhealthy on error
            with self.lock:
                server.healthy = False
            raise

        finally:
            # Decrement connection count
            with self.lock:
                server.active_connections -= 1

    def _health_check_loop(self):
        """Background thread to perform health checks"""
        while True:
            time.sleep(10)  # Check every 10 seconds

            for server in self.servers:
                healthy = self._check_health(server)

                with self.lock:
                    if healthy and not server.healthy:
                        print(f"βœ“ Server {server.host}:{server.port} is now HEALTHY")
                        server.healthy = True
                    elif not healthy and server.healthy:
                        print(f"βœ— Server {server.host}:{server.port} is now UNHEALTHY")
                        server.healthy = False

    def _check_health(self, server: BackendServer) -> bool:
        """Check if server is healthy"""
        try:
            url = f"http://{server.host}:{server.port}/health"
            response = requests.get(url, timeout=5)
            return response.status_code == 200
        except requests.RequestException:
            return False

    def get_status(self) -> dict:
        """Get load balancer status"""
        with self.lock:
            return {
                'total_servers': len(self.servers),
                'healthy_servers': sum(1 for s in self.servers if s.healthy),
                'servers': [
                    {
                        'host': s.host,
                        'port': s.port,
                        'healthy': s.healthy,
                        'active_connections': s.active_connections,
                        'weight': s.weight
                    }
                    for s in self.servers
                ]
            }

# Usage Example
if __name__ == '__main__':
    # Create backend servers
    servers = [
        BackendServer('server1.example.com', 8080, weight=5),
        BackendServer('server2.example.com', 8080, weight=3),
        BackendServer('server3.example.com', 8080, weight=2),
    ]

    lb = LoadBalancer(servers)

    # Test different algorithms
    print("=== Round Robin ===")
    for i in range(5):
        try:
            result = lb.forward_request('/api/users', algorithm='round_robin')
            print(f"Request {i+1} β†’ {result['server']}")
        except Exception as e:
            print(f"Request {i+1} failed: {e}")

    print("\n=== Least Connections ===")
    for i in range(5):
        try:
            result = lb.forward_request('/api/users', algorithm='least_connections')
            print(f"Request {i+1} β†’ {result['server']}")
        except Exception as e:
            print(f"Request {i+1} failed: {e}")

    print("\n=== IP Hash (Sticky Sessions) ===")
    client_ips = ['1.2.3.4', '5.6.7.8', '1.2.3.4', '5.6.7.8']
    for i, ip in enumerate(client_ips):
        try:
            result = lb.forward_request('/api/users', algorithm='ip_hash',
                                       client_ip=ip)
            print(f"Client {ip} β†’ {result['server']}")
        except Exception as e:
            print(f"Request from {ip} failed: {e}")

    # Get status
    print("\n=== Load Balancer Status ===")
    import json
    print(json.dumps(lb.get_status(), indent=2))

Layer 7 HTTP Load Balancer with Path Routing

from flask import Flask, request, Response
import requests

app = Flask(__name__)

# Define backend pools
BACKEND_POOLS = {
    'api': [
        'http://api1.example.com:3000',
        'http://api2.example.com:3000',
        'http://api3.example.com:3000',
    ],
    'web': [
        'http://web1.example.com:80',
        'http://web2.example.com:80',
    ],
    'admin': [
        'http://admin1.example.com:8080',
    ]
}

# Round robin counters
counters = {pool: 0 for pool in BACKEND_POOLS}

def select_backend(pool_name: str) -> str:
    """Select backend using round robin"""
    pool = BACKEND_POOLS[pool_name]
    counter = counters[pool_name]
    backend = pool[counter % len(pool)]
    counters[pool_name] += 1
    return backend

@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE'])
def load_balance(path):
    """Layer 7 load balancer with path-based routing"""

    # Path-based routing
    if path.startswith('api/'):
        backend = select_backend('api')
    elif path.startswith('admin/'):
        backend = select_backend('admin')
    else:
        backend = select_backend('web')

    # Forward request to backend
    url = f"{backend}/{path}"

    # Preserve headers
    headers = {key: value for key, value in request.headers if key != 'Host'}

    # Add X-Forwarded-For header
    headers['X-Forwarded-For'] = request.remote_addr
    headers['X-Real-IP'] = request.remote_addr

    try:
        # Forward request
        response = requests.request(
            method=request.method,
            url=url,
            headers=headers,
            data=request.get_data(),
            cookies=request.cookies,
            allow_redirects=False,
            timeout=30
        )

        # Return response
        return Response(
            response.content,
            status=response.status_code,
            headers=dict(response.headers)
        )

    except requests.RequestException as e:
        return Response(f"Bad Gateway: {e}", status=502)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

Prerequisites:

Related Concepts:

Used In Systems:

  • AWS ELB/ALB/NLB: Cloud load balancing
  • nginx/HAProxy: Open-source load balancers
  • Kubernetes: Service load balancing with kube-proxy

Explained In Detail:

  • System Design Deep Dive - Load balancing in production systems

Quick Self-Check

  • Can explain load balancing in 60 seconds?
  • Know difference between Layer 4 and Layer 7 load balancing?
  • Understand 3+ load balancing algorithms and their trade-offs?
  • Can explain health checks and failover mechanisms?
  • Know when to use sticky sessions vs session replication?
  • Can design a load balancing solution for given requirements?