Back to blog
KUBERNETES API GATEWAY

Part 1: Rate Limiting: A Useful Tool with Distributed Systems

April 26, 2018 | 10 min read
Rate Limiting for Distributed Systems

Within the computing domain, rate limiting is used to control the rate of operations initiated or consumed or traffic sent or received. If you have been developing software for more than a year, you have most likely bumped into this concept. However, as with many architectural challenges, there are usually more tradeoffs to consider than can first appear. This article outlines some of the implementations, benefits, and challenges with rate limiting in modern distributed applications.

Why Implement Rate Limiting?

You implement rate limiting primarily for one of three reasons: to prevent a denial of service (intentional or otherwise) through resource exhaustion, to limit the impact (or potential) of cascading failure, or to restrict or meter resource usage.

The denial of service prevention pattern can be seen by organizations like Twitter or Ebay placing a rate limiter in front of their SaaS APIs to prevent malicious attacks from shutting down the API backends and to provide consistent service for all consumers. Using rate limiting to prevent cascading failure (where some components within your system are partially degraded) can be seen within load shedding policies from payment APIs like Stripe. The restriction (or metering) usage pattern can be seen when polling an external information source for new data, such as health checking, where we only need to obtain data periodically, or we may be charged for each request we initiate.

Rate Limiting Implementation Options

Let’s keep things simple and assume that you are dealing with rate limiting within a point-to-point communication model. In this case, you can implement rate limiting at either point — on the initiator/sender, the “source”, or on the consumer/receiver, the “sink” — and there are also additional “middleware” options:

Rate Limiting Implementation

  • You can control the rate at which the request is initiated or sent at the source — think a time-throttled loop making a periodic API request
  • You can control the rate at which the request is being consumed or received at the sink — think new inbound HTTP connections that are refused until the current task/thread has finished processing
Rate Limiting Implementation

  • You can use an intermediary to buffer the initiation or send requests — perhaps placing the request within a queue (priorities can be applied to this queue, providing differing levels of SLA for requests)

  • You can use an intermediary to limit the initiation or send requests — perhaps using some form of proxy or gateway that trips a circuit-breaker when the downstream service is not accepting anymore requests
Rate Limiting Implementation

Understanding Rate Limiting Tradeoffs

When implementing rate limiting, the approach you take depends on how much control you have over the system's components.

  • Full Control of Both Points: If you control both the source and the sink (e.g., the API client and server), you have the flexibility to implement rate limiting at either or both ends. This allows for seamless collaboration and clear communication about which components handle rate limiting responsibilities.
  • Control Over Only One Point: If you manage just one side—such as a publicly available API or a data sink—you face limitations. You cannot rely on external sources to adhere to rate limiting rules or guidelines, even in systems without malicious actors.
  • Belt-and-Braces Approach: Even if you control both ends, you might prefer a redundant setup. By implementing rate limiting at both the source and sink, you can safeguard your system against unexpected traffic spikes or misbehaving components.

Other tradeoffs to consider include:

The Role of Sources and Sinks in Rate Limiting.

Effective rate limiting can sometimes be challenging due to constraints such as programming models or limited resources. Here are key considerations:

  1. Component-Level Limitations: In distributed systems, rate limiting on individual components may not be sufficient. For example, rate limiting a single source works fine until you scale to multiple sources to meet demand. Without coordination, this could lead to exceeding allowable call limits as each source operates independently.
  2. Avoiding Bespoke Implementations: Allowing backend service engineers to implement rate limiting within their services can lead to inconsistent implementations, especially across polyglot programming environments. This variance can complicate system maintenance and scalability.
  3. Offloading Rate Limiting Under Heavy Load: Applications experiencing high or spiky traffic loads may benefit from delegating rate limiting to an external service. This prevents the application from wasting internal resources on managing rate limits, allowing it to focus on core functionality.
  4. Single Responsibility Principle: At an architectural level, auxiliary tasks like rate limiting are often better handled by dedicated external components. This approach aligns with the single responsibility principle, ensuring that each system component focuses solely on its primary function.

By offloading rate limiting to a centralized or external service, you can achieve greater scalability, consistency, and resource efficiency across your distributed system.

The failure modes of any rate limiting middleware

  • You will need to determine what happens when a rate limiting service crashes (should the service fail open or closed?), and if the service is buffering requests you may need a restart policy (should requests be buffered to disk?)

The flexibility of algorithms used by rate limiting middleware

  • The primary advantage of writing rate limiting functionality into a source or sink application that you own is that you have full control over how the rate limiting algorithm is implemented. e.g., token bucket, fixed window, sliding window and the request (meta)data is used to make decisions.
  • You often have to evaluate which algorithms are available “out-of-the-box” with external rate limiters and determine whether others (including the associated metadata processing) can be plugged in

Examples of Rate Limiting in Action

Let’s explore a couple of examples to better understand how to apply rate limiting effectively.

Scenario: Calling a Third-Party SDK with Call Limits or Metered Charges

[You own the source but not the sink]

When working with a third-party service that enforces call limits or charges per API call, local (source-side) rate limiting becomes essential. Here's why:

  • Avoid Exceeding Limits: Exceeding the allowed call limit may result in errors, temporary blocking, or degraded service. Before implementing a production solution, it’s crucial to confirm the service-level agreement (SLA) or check the service’s documentation to understand how over-limit requests are handled.
  • Prevent Resource Waste: Without proper rate limiting, your application might waste resources by repeatedly retrying failed calls in a loop.
  • Control Costs: For metered calls, failing to limit requests at the source can lead to excessive charges—something no one wants.

To address this, I often use Google’s Guava RateLimiter in Java. It’s an excellent library for managing request rates effectively. Here’s an example of how I would implement rate limiting in a source application:

This is a simplified example from the Guava RateLimiter JavaDoc, and in reality, I would most likely have some Exception handling within the task execution block.

Offering a public API [you own the sink, but not (all of) the source(s)]

In this scenario, the only way you can guard against the backend of the API being overwhelmed is by rate limiting at the sink, preferably by offloading the limiting responsibility to an external service such as an API gateway.


Rate Limiting FAQs

Why should I rate limit an application or service?

You implement rate limiting primarily for one of three reasons: to prevent a denial of service (intentional or otherwise) through resource exhaustion, to limit the impact (or potential) of cascading failure, or to restrict or meter resource usage.

Do you know the fundamental theory & options for rate limiting applications or services?

You can control the rate at which the request is initiated or sent at the source — think a time-throttled loop making a periodic API request.

Or you can control the rate at the sink — think of new inbound HTTP connections that are refused until the current task/thread has finished processing.

You can also use an intermediary to buffer the initiation or send requests, perhaps placing the request within a queue. Additionally, you can use an intermediary to limit the initiation or send requests. For example, using some form of proxy or gateway that trips a circuit-breaker when the downstream service is not accepting any more requests.

How to implement Rate Limiting with Microservices or cloud-based applications?

You can implement rate limiting via application code (using an appropriate library or SDK), via a sidecar proxy like Envoy running alongside your service, or if the service is user-facing, via an API gateway like Edge Stack

Conclusion

In this article to our three-part Rate Limiting series you have learned about the motivations for rate limiting, and your options and associated tradeoffs. In the next article I’ll dive into more details for implementing rate limiting algorithms for API gateways!

Edge Stack Kubernetes API Gateway

Discover how to optimize performance and safeguard your systems with effective rate limiting strategies—start building resilient applications today