Fast Set up a production-quality Kubernetes cluster on AWS
What do you mean by "simple modern web application?"
What is an "Infrastructure Fabric"?
Technical Design in Five Minutes
Getting Started
Questions?
Phil Lombardi / March 1, 2017
Bootstrapping a microservices system is often a very difficult process for many small teams because there is a diverse ecosystem of tools that span a number of technical disciplines from operations to application development. This repository is intended for a single developer on a small team that meets the following criteria:
- Building a simple modern web application using a service-oriented or microservices approach.
- Using Amazon Web Services ("AWS") because of its best-in-class commodity "run-and-forget" infrastructure, such as RDS Aurora, PostgreSQL, Elasticsearch, or Redis.
- Limited operations experience or budget and wants to "get going quickly" but with a reasonably architected foundation that will not cause major headaches two weeks down the road because the foundation "was just a toy."
If the above criteria match then this project is for you and you should keep reading because this tutorial will help you get set up with a production-quality Kubernetes cluster on AWS in about 10 to 15 minutes!
What do you mean by "simple modern web application?"
Simple
The concept of simplicity is subjective, but for the purpose of this architecture "simple" means that the application conforms to two constraints:
- Business logic, for example, a REST API, is containerized and runs on the Kubernetes cluster.
- Persistence is offloaded to an external service (e.g. Amazon RDS).
Modern
Similarly, the term "modern" is ambiguous, but for the purpose of this architecture "modern" means that the application has a very narrow downtime constraint. We will be targeting an application that is designed for at least "four nines" of availability. Practically speaking, this means the app can be updated or modified without downtime.
What is an "Infrastructure Fabric"?
Infrastructure fabric is the term we use to describe the composite of a dedicated networking environment (VPC, more below), container cluster (Kubernetes), and any strongly associated resources that are used by services in the container cluster (e.g. RDS, Elasticache, Elasticsearch).
Technical Design in Five Minutes
To keep this infrastructure fabric simple, but also robust, we are going to make some opinionated design decisions.
Repository Structure
The GitHub repository is set up so that each fabric is defined in an independent Git branch. This allows for multiple fabrics to exist in parallel and for concurrent modification of the fabrics. Why might you want multiple fabrics? It allows multiple environments, e.g., develop, test, staging, prod. It also enables other types of useful separation, for example, Alice and Bob can each have their own cloud-deployed fabrics for whatever purpose they need. For simplicity, fabrics are named with DNS-compatible names.
Base Network (VPC)
A single new Virtual Private Cloud ("VPC") will be created in a single AWS region (us-east-2 "Ohio") that holds the Kubernetes cluster along with all long-lived systems (e.g., databases). A VPC is a namespace for networking. It provides strong network-level isolation from unrelated stuff running in an AWS account. It's a good idea to create a separate VPC rather than relying on the default AWS VPC. Over time, the default VPC becomes cluttered and hard to maintain or keep configured properly with other systems. Also, VPCs are a cost-free abstraction in AWS. The base network will be IPv4 because Kubernetes does not run on IPv6 networks yet.
Subnets
The VPC will be segmented into several subnets that are assigned to at least three availability zones ("AZ") within the region. An availability zone in AWS is a physically-isolated datacenter within an AWS region that has high-performance networking links with the other AZ's in the same region. The individual subnets will be used to ensure that both the Kubernetes cluster as well as any other systems, such as an RDS database, can be run simultaneously in at least two availability zones to ensure there is some robustness in the infrastructure fabric in case one AZ fails.
The deployed network fabric will not have an external vs. internal subnet distinction to avoid NAT gateways.
DNS
Before the Kubernetes cluster can be provisioned, a public DNS record in AWS Route 53 needs to exist. For example, at Ambassador Labs, we own the mysterious
k736.net
Kubernetes
A Kubernetes cluster is created in the new VPC and set up with a master node per availability zone and then the worker nodes (sometimes called "kubelets" or "minions" on the internet for historical reasons) are created across the availability zones as well. This design provides a high availability ("HA") cluster.
Getting Started
0. Prerequisites
You'll need all of the following to get through the tutorial. We'll go into more detail on how to set everything up in later sections.
- An active AWS account and AWS API credentials.
- A domain name and hosted DNS zone in AWS Route 53 that you can dedicate to the fabric. This domain name will have several subdomains attached to it by the time you finish this tutorial.
- All of the following third-party tool
NOTE: You really need all of these tools. A future tutorial will simplify the requirements to get set up.
1. Install third-party tools
Follow the links below for information on installing each tool.
2. Bootstrap AWS
Before we begin a couple things need to be done on the AWS account.
- Get an AWS IAM user and API credentials. Follow Bootstrapping AWS for instructions on setting up an AWS user or skip this step if you already have a user setup.
- Get a domain name for use with this fabric. Follow Bootstrapping Route 53 for instructions on setting up Route 53 properly or skip this step if you already have a domain setup.
3. Clone Repository
Clone this repository into your own account or organization. The cloned repository contains two branches:
master
fabric/example
master
fabric/example
4. Checkout the example branch then overlay the master branch tools onto it
The repository is setup as a monorepo that uses branches to keep environment definitions independent. Run the following commands to get where you want to be:
git pull
git checkout fabric/example
git checkout master -- bin/
After running those commands you should be in the
example/fabric
master
5. Configure the Fabric name, DNS, region and availability zones
Every AWS account is allocated a different set of availability zones that can be used within a region. For example, in the
us-east-1
us-east-1b
For this tutorial, we're going to assume
us-east-2
A useful script bin/configure_availability_zones.py is provided that will automatically update
config.json
bin/configure_availability_zones.py us-east-2
After a moment you should see the following message:
Updating config.json...Region = us-east-2Availability Zones = ['us-east-2a', 'us-east-2b', 'us-east-2c']Done!
You can confirm the operation was successful by comparing the above with the values in
config.json
cat config.json{"domain_name": "${YOUR_DOMAIN_HERE}","fabric_availability_zones": ["us-east-2a","us-east-2b","us-east-2c"],"fabric_name": "example","fabric_region": "us-east-2"}
Two other variables must be configured in
config.json
Open
config.json
fabric_name
Also, find and update the
domain_name
6. Create S3 bucket for Terraform and Kubernetes state storage
Terraform operates like a thermostat, which means that it reconciles the desired world (
*.tf
vpc-abcxyz -> aws_vpc.kubernetes
Terraform does not care where the state file is located so, in theory, it can be left on your local workstation, but a better option that encourages sharing and reuse is to push the file into Amazon S3, which Terraform natively knows how to handle.
Run the command:
bin/setup_state_store.py
If the operation is successful it will return the name of the S3 bucket, which is the value of
config.json["domain_name"]
-state
-
cat config.json{"domain_name": "k736.net"}bin/setup_state_store.pyBucket: k736-net-state
7. Generate the AWS networking environment
The high-level steps to get the networking set up are:
- Terraform generates a deterministic execution plan for the infrastructure it needs to create on AWS.
- Terraform executes the plan and creates the necessary infrastructure.
Below are the detailed steps:
- Configure Terraform to talk to the remote state store:
terraform remote config \-backend=s3 \-backend-config="region=us-east-2" \-backend-config="bucket=$(bin/get_state_store_name.py)" \-backend-config="key=$(bin/get_fabric_name.py).tfstate"
2. Run
terraform get -update=true
3. Run
terraform plan -var-file=config.json -out plan.out
4. Run
terraform apply plan.out
8. Generate the Kubernetes cluster
The high-level steps to get the Kubernetes cluster setup are:
- Ensure a public-private SSH key pair is generated for the cluster.
- Invoke the tool with some parameters that are output from the networking environment deployment.
kops
- Terraform generates a deterministic execution plan for the infrastructure it needs to create on AWS for the Kubernetes cluster. Then Terraform executes the plan and creates the necessary infrastructure.
- Wait for the Kubernetes cluster to deploy.
8.1 SSH public/private key pair
It is extremely unlikely you will need to SSH into the Kubernetes nodes, however, it is a good best practice to use a known or freshly-generated SSH key rather than relying on any tool or service to generate one. To generate a new key pair run the following command:
ssh-keygen -t rsa -b 4096 -N '' -C "kubernetes-admin" -f "keys/kubernetes-admin"
A 4096-bit RSA public and private key pair without a passphrase will be placed into the
/keys
mv keys/kubernetes-admin ~/.ssh/kubernetes-admin
Ensure you the private key is read/write only by your user as well:
chmod 600 ~/.ssh/kubernetes-admin
8.2 Invoke Kops to generate the Terraform template for Kubernetes
Kops takes in a bunch of parameters and generates a Terraform template that can be used to create a new cluster. The next command only generates the Terraform template; it does not affect your existing infrastructure.
kops create cluster \--zones="$(terraform output main_network_availability_zones_csv | tr -d '\n')" \--vpc="$(terraform output main_network_id | tr -d '\n')" \--network-cidr="$(terraform output main_network_cidr_block | tr -d '\n')" \--networking="kubenet" \--ssh-public-key='keys/kubernetes-admin.pub' \--target="terraform" \--name="$(bin/get_fabric_fqdn.py)" \--state="s3://$(bin/get_state_store_name.py)" \--out=kubernetes
8.3 Plan and Apply the Kubernetes cluster with Terraform
Below are the detailed steps:
- Run
cd kubernetes/
- Configure Terraform to talk to the remote state store
terraform remote config \-backend=s3 \-backend-config="region=us-east-2" \-backend-config="bucket=$(cd .. && bin/get_state_store_name.py)" \-backend-config="key=$(cd .. && bin/get_fabric_name.py)-kubernetes.tfstate"
3. Run
terraform get -update=true
4. Run
terraform plan -out plan.out
5. Run
terraform apply plan.out
8.4 Wait for the Kubernetes cluster to form
The Kubernetes cluster provisions asynchronously so even though Terraform exited almost immediately it's not likely that the cluster itself is running. To determine if the cluster is up you need to poll the API server. You can do this by running
kubectl cluster-info
How can I make this cheaper?
Here are two straightforward strategies:
- Use smaller EC2 instance sizes for the Kubernetes masters and nodes.
kops create cluster \--master-size=t2.nano --node-size=t2.nano \[ ... ]
2. Purchase EC2 reserved instances for the types of nodes you know you need.
Other options exist such as EC2 spot instances or refactoring your application to be less resource intensive, but those topics are outside the scope of this tutorial.
Questions?
We’re happy to help! Learn more about microservices, start using our open source Edge Stack API Gateway, or contact our sales team.