Tech Talk: Developing APIs the Easy Way – Streamline your API process with an endpoint-focused approach on Dec 5 at 11 am EST! Register now

Back to blog
API GATEWAY

Load balancing strategies in Kubernetes: L4 round robin, L7 round robin, ring hash, and more

Kay James
July 24, 2024 | 6 min read
Load balancing strategies in Kubernetes

What is loading balancing in Kubernetes?

Load balancing is the process of efficiently distributing network traffic among multiple backend services, and is a critical strategy for maximizing scalability and availability. There are a variety of choices for load balancing Kubernetes external traffic to Pods, each with different tradeoffs.

Selecting a load balancing algorithm should not be undertaken lightly, especially if you are using application layer (L7) aware protocols like gRPC. It’s all too easy to select an algorithm that will result in a single web server running hot or some other form of unbalanced load distribution.

Let’s explore these in more detail.

L4 Round Robin Load Balancing with kube-proxy

In a typical Kubernetes cluster, requests that are sent to a Kubernetes Service are routed by a component named kube-proxy. Somewhat confusingly, kube-proxy isn’t a proxy in the classic sense, but a process that implements a virtual IP for a service via iptables rules. This architecture adds additional complexity to routing. A small amount of latency is introduced for each request which increases as the number of services grows.

Moreover, kube-proxy routes at Layer 4 (L4), i.e., TCP, which doesn’t necessarily fit well with today’s application-centric protocols. For example, imagine two gRPC clients connecting to your backend Pods. In L4 load balancing, each client would be sent to a different backend Pod using round robin load balancing. This is true even if one client is sending 1 request per minute, while the other client is sending 100 requests per second.

So why use kube-proxy at all? In one word: simplicity. The entire round robin load balancing process is delegated to Kubernetes, the default strategy. Thus, whether you’re sending a request via Ambassador Edge Stack or via another service, you’re going through the same load balancing mechanism.

kube-proxy and IPVS

While kube-proxy uses iptables for routing by default, kube-proxy can also use IPVS (IP Virtual Server). The advantage of IPVS over iptables is scalability: no matter how many routing rules are required (which are directly proportional to the number of services), IPVS runs in O(1) time. Thus, for clusters that consist of thousands of services, IPVS is generally a preferred option. That said, IPVS-based routing is still L4-level routing and is subject to the constraints listed above.

This brings us to layer 7 (L7) routing for load balancing Kubernetes traffic, which we will discuss next.

L7 round robin load balancing

What if you’re using a multiplexed keep-alive protocol like gRPC or HTTP/2, and you need a more fair round robin algorithm? You can use an API Gateway for Kubernetes such as Ambassador Edge Stack, which can bypass kube-proxy altogether, routing traffic directly to Kubernetes Pods. Ambassador is built on Envoy Proxy, a L7 proxy, so each gRPC request is load balanced between available Pods.

In this approach, your load balancer will typically use the Kubernetes EndpointsSlices API to track the availability of Pods. In older versions of Kubernetes the Endpoint API can be used instead. When a request for a particular Kubernetes service is sent to your load balancer, the load balancer round robins the request between Pods that map to the given service.

Ring hash

Instead of rotating requests between different Pods, the ring hash load balancing strategy uses a hashing algorithm to send all requests from a given client to the same Pod. The ring hash approach is used for both “sticky sessions” (where a cookie is set to ensure that all requests from a client arrive at the same Pod) and for “session affinity” (which relies on client IP or some other piece of client state).

The hashing approach is useful for services that maintain per-client state (e.g., a shopping cart). By routing the same client to the same Pod, the state for a given client does not need to be synchronized across Pods. Moreover, if you’re caching client data on a given Pod, the probability of cache hits also increases.

The tradeoff with ring hash is that it can be more challenging to evenly distribute load between different backend servers, since client workloads may not be equal. In addition, the computation cost of the hash adds some latency to requests, particularly at scale.

Maglev

Like ring hash, maglev is a consistent hashing algorithm. Originally developed by Google, maglev was designed to be faster than the ring hash algorithm on hash table lookups and to minimize memory footprint. The ring hash algorithm generates fairly large lookup tables that do not fit onto your CPU processor cache.

For microservices, Maglev has one fairly expensive tradeoff: generating the lookup table when a node fails is relatively expensive. Given the transient nature of Kubernetes Pods, this may not work. For more details on the tradeoffs of different consistent hashing algorithms, this article covers consistent hashing for load balancing in detail, along with some benchmarks.

Learning More

The networking implementation within Kubernetes is more complex than it might first appear and somewhat more limited than many engineers understand. Matt Klein put together a very informative blog post in 2017 that stands the test of time “Introduction to modern network load balancing and proxying”. This provides a great foundation for understanding key concepts.

A series of additional posts explain why organizations have chosen to use Layer 7 aware proxies to load balance ingress traffic, such as Bugsnag, , and Twilio.

Edge Stack API Gatway

Optimize Kubernetes load balancing with Edge Stack. Enjoy simplified traffic management, automatic retries, timeouts, circuit breakers, and progressive delivery