Understanding Load Balancers in System Design

What is a Load Balancer?

A load balancer is a critical component in system design that distributes incoming traffic across multiple servers to ensure no single server becomes a bottleneck. It acts as the “traffic controller” for your infrastructure, improving responsiveness, reliability, and availability.

Why Do We Need Load Balancers?

Imagine a website receiving thousands of requests every second. If all requests go to one server, it may crash or become unresponsive due to overload. Load balancers help by distributing this load evenly across multiple servers.

Types of Load Balancers

There are several types of load balancers based on how they distribute traffic:

Layer 4 Load Balancer: Operates at the transport layer (TCP/UDP) and routes traffic based on IP address and port.
Layer 7 Load Balancer: Works at the application layer (HTTP/HTTPS) and routes traffic based on content, like URL path or cookies.
Hardware vs Software Load Balancers: Hardware load balancers are physical devices; software ones are implemented via software like Nginx, HAProxy, or AWS Elastic Load Balancing (ELB).

Real-World Example 1: Load Balancer in an E-Commerce Website

Let’s say you’re running an online shopping platform like Amazon. During a festival sale, millions of users visit the site simultaneously.

If all requests go to a single web server, it will get overwhelmed. To handle this load, a load balancer is placed in front of multiple web servers.

When a user makes a request (like opening a product page), the load balancer:

Receives the request.
Selects the least loaded or most suitable server.
Forwards the request to that server.
Returns the response back to the user.

Question: Does the user know which server is responding?

Answer: No. The user interacts only with the load balancer’s endpoint. The internal distribution is completely abstracted.

Load Balancing Algorithms

Load balancers use algorithms to decide how to distribute requests. Here are some common ones:

Round Robin: Sends each new request to the next server in a rotating list.
Least Connections: Chooses the server with the fewest active connections.
IP Hashing: Routes requests based on the user's IP address, helping with session stickiness.

Real-World Example 2: Sticky Sessions with Load Balancer

Consider a user logging into a banking application. Their session must stay on the same server to maintain session state. This is called a "sticky session."

A load balancer with IP hashing or cookie-based routing ensures the user is routed to the same server throughout their session.

Question: Can sticky sessions cause imbalance?

Answer: Yes. If one server gets more sticky sessions than others, it might become overloaded. This is why sticky sessions should be used carefully.

Health Checks

Load balancers constantly check the health of servers by sending periodic health check requests. If a server fails the check, it’s temporarily removed from the rotation.

Real-World Example 3: Handling Server Failure

Suppose your system has 3 web servers behind a load balancer. If one server crashes:

The load balancer detects it through a failed health check.
Stops routing new requests to that server.
Distributes traffic among the remaining healthy servers.

This ensures that users don’t experience downtime or failed responses.

Load Balancer as a Single Point of Failure

A common question is: If all traffic goes through the load balancer, isn’t it a single point of failure?

Yes, unless we make the load balancer itself redundant. This is often done by:

Using multiple load balancer instances in active-passive or active-active mode.
Using managed load balancer services like AWS ELB, GCP Load Balancing, which are fault-tolerant by design.

Question: Can a system have multiple levels of load balancing?

Answer: Yes. You can have global load balancers for routing between regions and local load balancers for routing within a data center.

Load Balancers in Microservices

In a microservices architecture, each service may have multiple instances. A load balancer helps route requests to the right instance of a service.

Tools like Envoy and Istio act as service mesh load balancers inside Kubernetes clusters.

Summary

Load balancers are foundational to building scalable, resilient systems. They:

Distribute traffic across servers
Improve availability and fault tolerance
Help in performance optimization
Enable session management and health checks

Key Takeaways

Load balancers act as traffic distributors for servers.
They use algorithms like round robin, least connections, etc.
They detect server failures and avoid routing traffic to unhealthy servers.
They should be redundant to avoid being a single point of failure.

What’s Next?

Next, we will explore Content Delivery Networks (CDNs) and how they complement load balancers by improving global content delivery speed and efficiency.

⬅ Previous TopicHorizontal vs Vertical Scaling

Next Topic ⮕CDN (Content Delivery Network)