Configure your Kubernetes cluster on AWS

This is the second step of running Kubernetes on AWS. Before we launch our cluster, let's define a few parameters that the cluster requires.

Cluster parameters

EC2 key pair

The keypair that will authenticate SSH access to your EC2 instances. The public half of this key pair will be configured on each CoreOS node.

After creating a key pair, you will use the name you gave the keys to configure the cluster. Key pairs are only available to EC2 instances in the same region. More info in the EC2 Keypair docs.

KMS key

Amazon KMS keys are used to encrypt and decrypt cluster TLS assets. If you already have a KMS Key that you would like to use, you can skip creating a new key and provide the Arn string for your existing key.

You can create a KMS key in the AWS console, or with the aws command line tool:

$ aws kms --region=<your-region> create-key --description="kube-aws assets"
{
    "KeyMetadata": {
        "CreationDate": 1458235139.724,
        "KeyState": "Enabled",
        "Arn": "arn:aws:kms:us-west-1:xxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx",
        "AWSAccountId": "xxxxxxxxxxxxx",
        "Enabled": true,
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyId": "xxxxxxxxx",
        "Description": "kube-aws assets"
    }
}

You will use the KeyMetadata.Arn string to identify your KMS key in the init step.

External DNS name

Select a DNS hostname where the cluster API will be accessible. Typically this hostname is available over the internet ("external"), so end users can connect from different networks. This hostname will be used to provision the TLS certificate for the API server, which encrypts traffic between your users and the API. Optionally, you can provide the certificates yourself, which is recommended for production clusters.

When the cluster is created, the controller will expose the TLS-secured API on a public IP address. You will need to create an A record for the external DNS hostname you want to point to this IP address. You can find the API external IP address after the cluster is created by invoking kube-aws status.

Alternatively, kube-aws can automatically create this A record in an existing Route 53 hosted zone. If you have a DNS zone hosted in Route 53, you can configure for it below.

S3 bucket

Amazon S3 buckets are used to transfer large stack templates generated by kube-aws to CloudFormation. If you already have a S3 bucket that you would like to use, you can skip creating a new bucket and provide the URI for your existing bucket.

You can create a S3 bucket in the AWS console, or with the aws command line tool.

The command varies among AWS regions.

For the us-east-1 region:

$ aws s3api --region=<your-region> create-bucket --bucket <your-bucket-name>
{
    "Location": "/<your-bucket-name>"
}

For other regions:

$ aws s3api create-bucket --bucket my-bucket --region eu-west-1 --create-bucket-configuration LocationConstraint=eu-west-1

Initialize an asset directory

Create a directory on your local machine to hold the generated assets:

$ mkdir my-cluster
$ cd my-cluster

Initialize the cluster CloudFormation stack with the KMS Arn, key pair name, and DNS name from the previous step:

$ kube-aws init \
--cluster-name=my-cluster-name \
--external-dns-name=my-cluster-endpoint \
--region=us-west-1 \
--availability-zone=us-west-1c \
--key-name=key-pair-name \
--kms-key-arn="arn:aws:kms:us-west-1:xxxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx"

Here us-west-1c is used for parameter --availability-zone, but supported availability zone varies among AWS accounts. Please check if us-west-1c is supported by aws ec2 --region us-west-1 describe-availability-zones, if not switch to other supported availability zone. (e.g., us-west-1a, or us-west-1b)

There will now be a cluster.yaml file in the asset directory. This is the main configuration file for your cluster.

Render contents of the asset directory

TLS certificates

  • In the simplest case, you can have kube-aws generate both your TLS identities and certificate authority for you.

    $ kube-aws render credentials --generate-ca
    

    This is not recommended for production, but is fine for development or testing purposes.

  • It is recommended that you supply your own immediate certificate signing authority and let kube-aws take care of generating the cluster TLS credentials.

    $ kube-aws render credentials --ca-cert-path=/path/to/ca-cert.pem --ca-key-path=/path/to/ca-key.pem
    

    For more information on operating your own CA, check out this awesome guide.

  • In certain cases, such as users with advanced pre-existing PKI infrastructure, the operator may wish to pre-generate all cluster TLS assets. In this case, you can run kube-aws render stack and copy in your TLS assets into the credentials/ folder before running kube-aws up.

    
    ls -R credentials/
    admin-key.pem       apiserver-key.pem   ca-key.pem          etcd-client-key.pem etcd-key.pem        worker-key.pem
    admin.pem           apiserver.pem       ca.pem              etcd-client.pem     etcd.pem            worker.pem
    

Render cluster assets

The next command generates the default set of cluster assets in your asset directory.

  $ kube-aws render stack

Here's what the directory structure looks like:

$ tree
.
├── cluster.yaml
├── credentials
│   ├── admin-key.pem
│   ├── admin.pem
│   ├── apiserver-key.pem
│   ├── apiserver.pem
│   ├── ca-key.pem
│   ├── ca.pem
│   ├── worker-key.pem
│   └── worker.pem
│   ├── etcd-key.pem
│   └── etcd.pem
│   ├── etcd-client-key.pem
│   └── etcd-client.pem
├── kubeconfig
├── stack-template.json
└── userdata
    ├── cloud-config-controller
    └── cloud-config-worker

These assets (templates and credentials) are used to create, update and interact with your Kubernetes cluster.

At this point you should be ready to create your cluster. You can also now check the my-cluster asset directory into version control if you desire. The contents of this directory are your reproducible cluster assets. Please take care not to commit the my-cluster/credentials directory but rather to encrypt and/or put it to more secure storage, as it contains your TLS secrets and access tokens. If you're using git, the credentials directory will already be ignored for you.

PRODUCTION NOTE: the TLS keys and certificates generated by kube-aws should not be used to deploy a production Kubernetes cluster. Each component certificate is only valid for 90 days, while the CA is valid for 365 days. If deploying a production Kubernetes cluster, consider establishing PKI independently of this tool first. Read more below.

Did everything render correctly?

If you are familiar with CoreOS and the AWS platform, you may want to include some additional customizations or optional features. Read on below to explore more.

Yes, ready to launch the cluster View optional features & customizations

Customizations to your cluster

You can now customize your cluster by editing asset files. Any changes to these files will require a render and validate workflow, covered below.

Customize infrastructure

  • cluster.yaml

    This is the configuration file for your cluster. It contains the configuration parameters that are templated into your userdata and CloudFormation stack.

    Some common customizations are:

    • change the number of workers
    • specify tags applied to all created resources
    • create cluster inside an existing VPC
    • change controller and worker volume sizes

  • userdata/

    • cloud-config-worker
    • cloud-config-controller

    This directory contains the cloud-init cloud-config userdata files. The CoreOS operating system supports automated provisioning via cloud-config files, which describe the various files, scripts and systemd actions necessary to produce a working cluster machine. These files are templated with your cluster configuration parameters and embedded into the CloudFormation stack template.

    Some common customizations are:

  • stack-template.json

    This file describes the AWS CloudFormation stack which encompasses all the AWS resources associated with your cluster. This JSON document is templated with configuration parameters, we well as the encoded userdata files.

    Some common customizations are:

    • tweak AutoScaling rules and timing
    • instance IAM roles
    • customize security groups beyond the initial configuration

  • credentials/

    This directory contains both encryped and unencrypted TLS assets for your cluster, along with a pre-configured kubeconfig file which provides access to your cluster api via kubectl.

    You can also specify additional access tokens in tokens.csv as shown in the official docs.

Kubernetes Container Runtime

The kube-aws tool now optionally supports using rkt as the kubernetes container runtime. To configure rkt as the container runtime you must run with a CoreOS version >= v1151.0.0 and configure the runtime flag.

Edit the cluster.yaml file:

containerRuntime: rkt
releaseChannel: stable

Note that while using rkt as the runtime is now supported, it is still a new option as of the Kubernetes v1.4 release and has a few known issues.

Calico network policy

The cluster can be optionally configured to use Calico to provide network policy. These policies limit and control how different pods, namespaces, etc can communicate with each other. These rules can be managed after the cluster is launched, but the feature needs to be turned on beforehand.

Edit the cluster.yaml file:

useCalico: true

Route53 Host Record

kube-aws can optionally create an ALIAS record for the controller's ELB in an existing Route53 hosted zone.

Edit the cluster.yaml file:

externalDNSName: kubernetes.staging.example.com
createRecordSet: true
hostedZoneId: A12B3CDE4FG5HI
# DEPRECATED: use hostedZoneId instead
#hostedZone: staging.example.com

If createRecordSet is not set to true, the deployer will be responsible for making externalDNSName routable to the the ELB managing the controller nodes after the cluster is created.

Multi-AZ Clusters

Kube-aws supports "spreading" a cluster across any number of Availability Zones in a given region.

A word of caution about EBS and Persistent Volumes: Any pods deployed to a Multi-AZ cluster must mount EBS volumes via Persistent Volume Claims. Specifying the ID of the EBS volume directly in the pod spec will not work consistently if nodes are spread across multiple zones.

Read more about Kubernetes Multi-AZ cluster support here.

A common pitfall when deploying multi-AZ clusters in combination with cluster-autoscaler

cluster-autoscaler is a tool that automatically adjusts the number of Kubernetes worker nodes when:

  • there is a pod that doesn’t have enough space to run in the cluster
  • some nodes in the cluster are so underutilized, for an extended period of time, that they can be deleted and their pods will be easily placed on some other, existing nodes. https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler

A common pitfall in deploying cluster-autoscaler to a multi-AZ cluster is that you have to instruct an Auto Scaling Group not to spread over multiple availability zones or cluster-autoscaler results in instability while scaling out the nodes i.e. it takes unnecessary much time to finally bring up a node in the insufficient zone.

The autoscaling group should span 1 availability zone for the cluster autoscaler to work. If you want to distribute workloads evenly across zones, set up multiple ASGs, with a cluster autoscaler for each ASG. At the time of writing this, cluster autoscaler is unaware of availability zones and although autoscaling groups can contain instances in multiple availability zones when configured so, the cluster autoscaler can't reliably add nodes to desired zones. That's because AWS AutoScaling determines which zone to add nodes which is out of the control of the cluster autoscaler. For more information, see https://github.com/kubernetes/contrib/pull/1552#discussion_r75533090. https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler/cloudprovider/aws#deployment-specification

Please read the following guides carefully and select the appropriate deployment according to your requirement regarding auto-scaling.

For production cluster not requiring cluster-autoscaler

If you don't need auto-scaling at all, or you need only the AWS-native auto-scaling i.e. the a combination of auto scaling groups, scaling policies, CloudWatch alarms, theoretically, you can safely go this way.

Edit the cluster.yaml file to define multiple subnets, each with a different availability zone hence multi-AZ:

 subnets:
   - availabilityZone: us-west-1a
     instanceCIDR: "10.0.0.0/24"
   - availabilityZone: us-west-1b
     instanceCIDR: "10.0.1.0/24"

This implies that you rely on AWS AutoScaling for selecting which subnet hence which availability zone to add a node when an auto-scaling group's DesiredCapacity is increased.

Please read the AWS documentation for more details about AWS Auto Scaling.

For production cluster requiring cluster-autoscaler

You must utilize an experimental feature called Node Pool to achieve this deployment. Please read the documentation for experimental features for more instructions.

Certificates and Keys

kube-aws render begins by initializing the TLS infrastructure needed to securely operate Kubernetes. If you have your own key/certificate management system, you can overwrite the generated TLS assets after kube-aws render. More information on Kubernetes certificate generation.

When kube-aws up creates the cluster stack, it will use whatever TLS assets it finds in the credentials folder at the time.

This includes the certificate authority, signed server certificates for the Kubernetes API server and workers, and a signed client certificate for administrative use.

  • APIServerCert, APIServerKey

    The API server certificate will be valid for the value of externalDNSName, as well as a the DNS names used to route Kubernetes API requests inside the cluster.

    kube-aws does not manage a DNS zone for the cluster. This means that the deployer is responsible for ensuring the routability of the external DNS name to the public IP of the master node instance.

    The certificate and key granted to the kube-apiserver. This certificate will be presented to external clients of the Kubernetes cluster, so it should be valid for external DNS names, if necessary.

    Additionally, the certificate must have the following Subject Alternative Names (SANs). These IPs and DNS names are used within the cluster to route from applications to the Kubernetes API:

    • 10.0.0.50
    • 10.3.0.1
    • kubernetes
    • kubernetes.default
    • kubernetes.default.svc
    • kubernetes.default.svc.cluster.local

  • WorkerCert, WorkerKey

    The certificate and key granted to the kubelets on worker instances. The certificate is shared across all workers, so it must be valid for all worker hostnames. This is achievable with the Subject Alternative Name (SAN) *.*.compute.internal, or *.ec2.internal if using the us-east-1 AWS region.

  • CACert

    The certificate authority's TLS certificate is used to sign other certificates in the cluster.

    These assets are stored unencrypted in your credentials folder, but are encyrpted using Amazon KMS before being embedded in the CloudFormation template.

    All keys and certs must be PEM-formatted and base64-encoded.

Render and validate cluster assets

After you have completed your customizations, re-render your assets with the new settings:

$ kube-aws render credentials
$ kube-aws render stack

The validate command check the validity of your changes to the cloud-config userdata files and the CloudFormation stack description.

This is an important step to make sure your stack will launch successfully:

$ kube-aws validate --s3-uri s3://<your-bucket-name>/<prefix>

If your files are valid, you are ready to launch your cluster.