Back to blog
KUBERNETES API GATEWAY

Part 3: Implementing a Java Rate Limiting Service for Edge Stack API Gateway

May 17, 2018 | 13 min read
Java Rate Limiting Service

The rate limiting functionality offered by the Kubernetes API Gateway, Edge Stack is fully customizable, allowing any service that implements a gRPC endpoint to decide whether a request should be limited or not. In this article, which builds on the previous part 2 and part 1, you will learn how to build and deploy a simple Java-based rate limiting service for Edge Stack how rate limiting works.

Getting Setup: The Docker Java Shop

In my previous tutorial, “Deploying Java Apps with Kubernetes and the Edge Stack API Gateway,” I demonstrated how to add the open-source Edge Stack API gateway to a series of Java-based services (Dropwizard and Spring Boot). These services were deployed on Kubernetes.

If you haven’t seen that tutorial, I recommend reviewing it along with the other articles in the series. They provide essential fundamentals.

This article assumes you’re already comfortable with building Java-based microservices, deploying them to Kubernetes, and have the necessary prerequisites installed. For this tutorial, I’m using Docker for Mac Edge with built-in Kubernetes support. However, the principles should apply similarly if you’re using Minikube or a remote cluster.

Prerequisites

You will need to have these installed locally:

  • Docker for Desktop — I am using the edge community edition (18.04.0-ce), with in-built support for a local Kubernetes cluster — I have also increased the memory available to Docker to 8Gb, as the Java services can be a little memory-hungry at times
  • Editor of choice, Atom or VS code, or IntelliJ for the Java code

You can grab the latest version of the “Docker Java Shop” source code here:

https://github.com/danielbryantuk/oreilly-docker-java-shopping

You can clone the repo via SSH like so:

$ git clone git@github.com:danielbryantuk/oreilly-docker-java-shopping.git

The initial version of the service architecture and deployment looked as follows:


You can see from the diagram that the Docker Java Shopping application consists of primarily three simple services, and in the previous tutorial, you added the Edge Stack API Gateway as the “front door” of the system. It is worth noting that the API Gateway will be running on port 80, the standard unauthenticated web port, and so you will need to make sure there is nothing else locally running on the same port.

Rate Limiting 101 with Edge Stack API Gateway

I have added a new folder, “kubernetes-ambassador-ratelimit” to the repo containing the Kubernetes config for this tutorial. so go ahead and navigate to this directory via the command line. Listing that directory will show the following files:

(master *) oreilly-docker-java-shopping $ cd kubernetes-ambassador-ratelimit/
(master *) kubernetes-ambassador-ratelimit $ ll
total 48
0 drwxr-xr-x 8 danielbryant staff 256 23 Apr 09:27 .
0 drwxr-xr-x 19 danielbryant staff 608 23 Apr 09:27 ..
8 -rw-r — r — 1 danielbryant staff 2033 23 Apr 09:27 ambassador-no-rbac.yaml
8 -rw-r — r — 1 danielbryant staff 698 23 Apr 10:30 ambassador-rate-limiter.yaml
8 -rw-r — r — 1 danielbryant staff 476 23 Apr 10:30 ambassador-service.yaml
8 -rw-r — r — 1 danielbryant staff 711 23 Apr 09:27 productcatalogue-service.yaml
8 -rw-r — r — 1 danielbryant staff 659 23 Apr 10:02 shopfront-service.yaml
8 -rw-r — r — 1 danielbryant staff 678 23 Apr 09:27 stockmanager-service.yaml

You can apply these Kubernetes config files with this command:

$ kubectl apply -f .

Doing so will deploy the following service architecture, with the primary difference from the previous architecture being the addition of the “ratelimiter” service. This service is written in Java, without a web/microservices framework, and it exposes a gRPC endpoint that Ambassador can use for rate limiting. This allows for customization and flexibility regarding the rate limiting algorithm you can implement (for more details on the benefits of this, check out my earlier article).


Exploring the Rate Limiter Kubernetes Service

The ratelimiter service is deployed into Kubernetes just like any other service, and could be horizontally scaled as appropriate. Here are the contents of ambassador-rate-limiter.yaml Kubernetes config file:

---
apiVersion: v1
kind: Service
metadata:
name: ratelimiter
annotations:
getambassador.io/config: |
---
apiVersion: ambassador/v0
kind: RateLimitService
name: ratelimiter_svc
service: "ratelimiter:50051"
labels:
app: ratelimiter
spec:
type: ClusterIP
selector:
app: ratelimiter
ports:
- protocol: TCP
port: 50051
name: http
---
apiVersion: v1
kind: ReplicationController
metadata:
name: ratelimiter
spec:
replicas: 1
template:
metadata:
labels:
app: ratelimiter
spec:
containers:
- name: ratelimiter
image: danielbryantuk/ratelimiter:0.3
ports:
- containerPort: 50051

You will explore the contents of the underlying “danielbryantuk/ratelimiter:0.3” Docker image later in the article, but for now all you need to know is that this service is running within the cluster, and exposes port 50051.

In the ambassador-service.yaml config file, I have also updated the Edge Stack Kubernetes annotations config to ensure that requests to the shopfront service are rate limited simply by including the “rate_limits” property. I have also added some additional metadata “- descriptor: Example descriptor”, which I will explain in more detail in the next article. For now, I’ll say that this is a good way to pass additional metadata into the rate limiting service.

---
apiVersion: v1
kind: Service
metadata:
labels:
service: ambassador
name: ambassador
annotations:
getambassador.io/config: |
---
apiVersion: ambassador/v0
kind: Mapping
name: shopfront_stable
prefix: /shopfront/
service: shopfront:8010
rate_limits:
- descriptor: Example descriptor

Check that the deployment has succeeded using kubectl:

(master *) kubernetes-ambassador-ratelimit $ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador LoadBalancer 10.105.253.3 localhost 80:30051/TCP 1d
ambassador-admin NodePort 10.107.15.225 <none> 8877:30637/TCP 1d
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16d
productcatalogue ClusterIP 10.109.48.26 <none> 8020/TCP 1d
ratelimiter ClusterIP 10.97.122.140 <none> 50051/TCP 1d
shopfront ClusterIP 10.98.207.100 <none> 8010/TCP 1d
stockmanager ClusterIP 10.107.208.180 <none> 8030/TCP 1d

All six of our services look good to go (plus the Kubernetes service) — that’s three Java services, two Ambassador services, and the rate limiter service.

You can test the deployment by making a curl to the shopfront endpoint, which (as shown above) should be running on the EXTERNAL-IP of localhost on port 80:

(master *) kubernetes-ambassador-ratelimit $ curl localhost/shopfront/
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
...
</div>
</div>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/bootstrap.min.js"></script>
</body>
</html>(master *) kubernetes-ambassador-ratelimit $

You will notice that this produces a lot of HTML, which is simply the frontpage of the Docker Java Shop, and can be more easily viewed within a browser pointed at http://localhost/shopfront/. However, it will be easier to use curl for our rate limiting experiments.

Testing the Rate Limiting

For this demonstration, I have decided to apply rate limiting directly to the backend service itself. When the rate limit service evaluates whether to limit a request, it considers only the number of requests made to a specific backend service within a given time period.

The rate limiting logic uses the token-bucket algorithm with a maximum bucket size of 20 and a refill rate of 10 tokens per second.

This means you can make up to 10 requests per second to the API without issues. Additionally, you can temporarily exceed this rate because the bucket starts with 20 tokens, allowing for a brief burst. However, once the initial "burst" tokens are used up and you exceed 10 requests per second, you will receive an HTTP 429 “Too Many Requests” status code. At this point, the Edge Stack API gateway stops forwarding requests to the backend service.

Let’s test this by simulating a high request load using

curl
. To avoid displaying the full HTML payload in the terminal, use the
--output /dev/null
option. Combine this with
--silent
to suppress the curl output but still show non-OK HTTP response status codes using
--show-error
and
--fail
.

You can bundle these options into a simple bash loop with

date
output (to track when requests are sent) to create a basic load generator. Be ready to press
CTRL-C
to stop the loop when needed!:

$ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/; echo -e $(date);done
(master *) kubernetes-ambassador-ratelimit $ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/; echo -e $(date);done
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
...
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
^C

As you can see, the first several requests are served fine, as evidenced by the date the request was displayed alongside no errors, and quickly (at least on my Mac), the loop exceeds 10 requests per second, and I start receiving 429 HTTP response code errors.

As an aside, I would normally use the Apache Benchmarking “ab” load generating tool for this simple experiment, but ab might have an issue with calling localhost (or the Docker config was presenting some problems).

Examine the Rate Limiting Service

The code for the Ambassador Java rate limiting service can be found in the repo ambassador-java-rate-limiter on my GitHub account. In this repo you will find the code and the Dockerfile I have used to build the container image that I pushed to DockerHub. Using this Dockerfile as a template, you can modify the code and then build and push your own image to DockerHub. You can then modify the ambassador-rate-limiter.yaml file in the main Docker Java Shopping repo to use your service for rate limiting.

Exploring the Java Code

If you now dive into the actual Java code, the main class of interest is RateLimiterServer, which implements the rate limiting gRPC interface defined by the Envoy proxy that is used within the Ambassador API. I’ve created a local copy of the ratelimit.proto interface that is used by the gRPC Java build tooling defined in the Maven pom.xml. There are three primary points of interest in the code: implementing the gRPC interface, running the gRPC server, and implementing the actual rate limiting code. Let’s now look at these in turn.

Implementing the Rate Limiting gRPC Interface

If you look into the inner class within RateLimitServer, named “RateLimiterImpl”, which extends RateLimitServiceGrpc.RateLimitServiceImplBase, you can see that I have overridden a method from this abstract class:

public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver)

A lot of the naming conventions used here come from the Java gRPC libraries, and for more information, you can consult the gRPC Java documentation. Having said this, you can clearly see the root of many of the names if you look into the ratelimit.proto file that defines the expected rate limiting interface by the Envoy proxy used behind the scenes of Ambassador. For example, you can see that the core service defined in this file is named RateLimitService (line 9), and there is a single RPC method defined within the service “rpc ShouldRateLimit (RateLimitRequest) returns (RateLimitResponse) {}” (line 11) which is implemented in Java through the method signature shown above for “shouldRateLimit”.

If you are interested, a lot of the Java gRPC code generation magic is conducted by the “protobuf-maven-plugin” (line 99 of the pom.xml).

Running the gRPC server

Once you have implemented the gRPC interface defined with ratelimit.proto, the next thing to do is to create a gRPC server that can listen and reply to requests made to it. If you look into the content of the RateLimitServer, you can follow the chain of processing from the main method. In a nutshell, the main method creates an instance of the RateLimitServer class, calls the start() method, and then calls the blockUntilShutdown() method. This starts an instance of the class, exposes the gRPC interface on the defined port, and listens for requests.

Implementing Java Rate Limiting Code

The actual Java code responsible for the rate limiting process is contained within the shouldRateLimit() (line 75) method of the RateLimiterImpl inner class. Rather than implementing my own rate limiting algorithm, I’m using the popular bucket4j Java rate limiting library that is based on the token-bucket algorithm. As I limit the number of requests made to each service, each bucket will be identified (or keyed) with the service name. Every request to each service will remove a token from the associated bucket. In this example, I am not storing the buckets in an external database and instead have opted to use an in-memory ConcurrentHashMap.

If I were implementing this service for a production use case, I would typically use an external persistence store to enable horizontal scalability, probably something like Redis. However, for now, you will have to bear in mind that if you horizontally scale the rate limit service without changing each service’s bucket limits, then you will be increasing the number of allowable (non-rate limited) requests directly in relation to the increased number of services.

An excerpt of the RateLimiterImpl code that creates the bucket4j bucket can be seen below:

private Bucket createNewBucket() {
long overdraft = 20;
Refill refill = Refill.smooth(10, Duration.ofSeconds(1));
Bandwidth limit = Bandwidth.classic(overdraft, refill);
return Bucket4j.builder().addLimit(limit).build();
}

The shouldRateLimit method code can be seen below, and this simply attempts to tryConsume(1) — try and consume one token from the bucket — before returning an appropriate response code.

@Override
public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver) {
logDebug(rateLimitRequest);
String destServiceName = extractDestServiceNameFrom(rateLimitRequest);
Bucket bucket = getServiceBucketFor(destServiceName);
Ratelimit.RateLimitResponse.Code code;
if (bucket.tryConsume(1)) {
code = Ratelimit.RateLimitResponse.Code.OK;
} else {
code = Ratelimit.RateLimitResponse.Code.OVER_LIMIT;
}
Ratelimit.RateLimitResponse rateLimitResponse = generateRateLimitResponse(code);
responseStreamObserver.onNext(rateLimitResponse);
responseStreamObserver.onCompleted();
}

Results

The code should be relatively easy to understand, and the primary responsibility of this method is to return either Ratelimit.RateLimitResponse.Code.OK, if no rate limiting is required on the current request or Ratelimit.RateLimitResponse.Code.OVER_LIMIT if this request should be denied due to rate limiting. Depending on this response by this gRPC service, the Ambassador API gateway will either pass the request through to the backend service or short-circuit this trip and simply return a 429 “Too Many Requests” HTTP status code without calling the backend service.

This simple example protects against one service becoming overwhelmed, but hopefully, this also demonstrates the core rate limiting concepts and could be relatively easily adapter to rate limit based on request metadata, such as user ID or something similar.

Until the Next Time…

This article has demonstrated how you can create a rate limiting service in Java that can easily be integrated into the Edge Stack API gateway and fully customized with any rate limiting logic you require. In the next and final article of the series you will explore the Envoy rate limiting API in more depth, to learn more about designing a rate limiting service.

Edge Stack Kubernetes API Gateway

Customize and scale your API traffic control with Edge Stack's flexible rate limiting—start optimizing your services today