Part 1: Rate Limiting: A Useful Tool with Distributed Systems
Why Implement Rate Limiting?
Rate Limiting Implementation Options
Rate Limiting Tradeoffs?
Examples
Why should I rate limit an application or service?
Can you explain the fundamental theory and options for rate limiting applications or services?
How to implement Rate Limiting with Microservices or cloud-based applications?
Conclusion
Within the computing domain, is used to control the rate of operations initiated or consumed or traffic sent or received. If you have been developing software for more than a year, you have most likely bumped into this concept. However, as with many architectural challenges, there are usually more tradeoffs to consider than can first appear. This article outlines some of the implementations, benefits, and challenges with .
Why Implement Rate Limiting?
You implement rate limiting primarily for one of three reasons: to prevent a denial of service (intentional or otherwise) through resource exhaustion, to limit the impact (or potential) of cascading failure, or to restrict or meter resource usage.
The denial of service prevention pattern can be seen by organisations like or placing a rate limiter in front of their SaaS APIs to prevent malicious attacks from shutting down the API backends and to provide consistent service for all consumers. Using to prevent cascading failure (where some components within your system are partially degraded) can be seen within load shedding policies from payment APIs like . The restriction (or metering) usage pattern can be seen when polling an external information source for new data, such as health checking, where we only need to obtain data periodically, or we may be charged for each request we initiate.
Rate Limiting Implementation Options
Let’s keep things simple and assume that you are dealing with rate limiting within a point-to-point communication model. In this case, you can implement rate limiting at either point — on the initiator/sender, the “source”, or on the consumer/receiver, the “sink” — and there are also additional “middleware” options:
- You can control the rate at which the request is initiated or sent at the source — think a time-throttled loop making a periodic API request
- You can control the rate at which the request is being consumed or received at the sink — think new inbound HTTP connections that are refused until the current task/thread has finished processing
- You can use an intermediary to buffer the initiation or send requests — perhaps placing the request within a queue (priorities can be applied to this queue, providing differing levels of SLA for requests)
- You can use an intermediary to limit the initiation or send requests — perhaps using some form of proxy or gateway that trips a circuit-breaker when the downstream service is not accepting anymore requests
Rate Limiting Tradeoffs?
Suppose you are developing a system and have full control of both points. In this case, all implementation options are available to you, and you simply have to collaborate on the implementations and communicate which points (and corresponding components) have the associated rate limiting responsibilities.
On the other hand, if you only control one of the points — say the sink, or a publicly available API — then your options are somewhat more limited as you can’t rely on the sources following any guidelines or rules (even if the system contains no bad actors). Even if you do control both points, you may still want to implement a “belt and braces” approach that includes rate limiting at both ends.
Other tradeoffs to consider include:
The ability of sources and sinks to handle the rate limiting.
- Sometimes it is not possible to implement effective rate limiting within components due to programming models or limited resources available, etc
- Within a distributed system, rate limiting on an individual component may not provide the required functionality (or at least, not without some level of coordination). For example, if you have rate limited an individual source making calls, but you need to scale to two sources to meet demand horizontally, you may now be making twice the allowable calls
- You may also not want backend service engineers writing rate limiting functionality, as this could lead to bespoke implementations or variance between polyglot programming stacks.
- If an application is under heavy or very spiky load, you may want to offload any rate limiting functionality to an external service to prevent resources from being wasted within the application by performing the rate limiting tasks.
- I’m sure you’ve heard of the single responsibility principle, and at the coarse-grained architecture level, you may require that auxiliary functionality like rate limiting is provided by an external component that has this responsibility
The failure modes of any rate limiting middleware
- You will need to determine what happens when a rate limiting service crashes (should the service fail open or closed?), and if the service is buffering requests you may need a restart policy (should requests be buffered to disk?)
The flexibility of algorithms used by rate limiting middleware
- The primary advantage of writing rate limiting functionality into a source or sink application that you own is that you have full control over how the rate limiting algorithm is implemented. e.g., , and the request (meta)data is used to make decisions.
- You often have to evaluate which algorithms are available “out-of-the-box” with external rate limiters and determine whether others (including the associated metadata processing) can be plugged in
Examples
To make these ideas a little more concrete, let’s look at a couple of examples.
Running a task that calls a third-party SDK with call limits or a metered charge per call [you own the source, but not the sink]
For both the call limits and metered charge sink scenario, I want to implement local (source) rate limiting. I can assume that if I exceed the rate limit, I may receive an error, or get (temporarily) blocked. I would need to confirm the SLA or check the documentation for a production implementation — and whatever happens, I don’t want my application simply spinning in a loop constantly attempting calls, as at worst, this simply wastes my resources. Without source rate limiting for a metered call, I simply end up paying a lot, and no one likes that!
I often use Google’s Guava RateLimiter for this type of problem in the Java world. An example of the type of code I would write in my (source) application would be:
This is a simplified example from the Guava , and in reality, I would most likely have some Exception handling within the task execution block.
Offering a public API [you own the sink, but not (all of) the source(s)]
In this scenario, the only way you can guard against the backend of the API being overwhelmed is by rate limiting at the sink, preferably by offloading the limiting responsibility to an external service such as an API gateway.
Rate Limiting FAQs
Why should I rate limit an application or service?
You implement rate limiting primarily for one of three reasons: to prevent a denial of service (intentional or otherwise) through resource exhaustion, to limit the impact (or potential) of cascading failure, or to restrict or meter resource usage.
Can you explain the fundamental theory and options for rate limiting applications or services?
You can control the rate at which the request is initiated or sent at the source — think a time-throttled loop making a periodic API request.
Or you can control the rate at the sink — think of new inbound HTTP connections that are refused until the current task/thread has finished processing.
You can also use an intermediary to buffer the initiation or send requests, perhaps placing the request within a queue. Additionally, you can use an intermediary to limit the initiation or send requests. For example, using some form of proxy or gateway that trips a circuit-breaker when the downstream service is not accepting any more requests.
How to implement Rate Limiting with Microservices or cloud-based applications?
You can implement rate limiting via application code (using an appropriate library or SDK), via a sidecar proxy like Envoy running alongside your service, or if the service is user-facing, via an API gateway like Edge Stack
Conclusion
In this article to our three-part Rate Limiting series you have learned about the motivations for rate limiting, and your options and associated tradeoffs. In the next article I’ll dive into more details for implementing rate limiting algorithms for API gateways!
Learn More about Rate Limiting