Our first post introduced Terraform using Tack as a way of instrumenting the creation of a Kubernetes cluster on AWS. But using Tack certainly isn’t the only way you can do this. In this post, we will step you through how to use kops to create and manage your AWS Kubernetes cluster. Kops, in its own words is ‘kubectl for clusters’. If you’re familiar with Kubernetes, you’ll know that
kubectl
is the command line tool you use to interact with Kubernetes.
These instructions should work on either macOS or Linux. Windows users might need to do a bit of googling.
Prerequisites
Before you get started you’ll need to install a couple of the kubernetes cli tools. You’ll also need to ensure the aws-cli is setup on your machine.
- Kops : This is the main control tool for kops and can be downloaded here. Choose a release that relates to the version of kubernetes that you want to install, e.g if you are wanting to install a 1.7.x version of kubernetes then download the 1.7.0 version of kops.
- Kubectl: This is the main cli tool for controlling kubernetes clusters. You’ll find the binary here. Choose the binary that has the same version as the kubernetes cluster version that you want to install. You can also download the binary directly from https://storage.googleapis.com/kubernetes-release/release/v$KUBEVER/bin/$ARCH/amd64/kubectl where $KUBEVER is your desired version and $ARCH is either darwin or linux.
Now that you have your pre-reqs sorted you can get started configuring your cluster. The first step is to export the secret values that you want to use with your cluster as environment variables:
AWS credentials
You’ll firstly need to tell kops which AWS credentials to use. If you don’t export these, kops will just use the [default]
credentials in your ~/.aws/credentials
file. Replace the XXXXX
s with your actual key and secret:
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXX
You’ll also need to export the name of your cluster. This will be used in several different configuration points, including DNS:
export NAME=kops-cluster-a.connect.cd
Store for KOPS state
You’ll need an S3 bucket to keep the current state of your kops cluster in. This is the source of truth at all times for your kops cluster. You can create the bucket with the following command (assuming you have the correct rights in AWS):
aws s3api create-bucket --bucket $NAME-state-store --acl private
You can add any additional Policies that suit your requirements but those options will not be covered here.
Then, you’ll simply need to export the bucketname value as an environment variable so kops knows which store to use:
export KOPS_STATE_STORE=s3://$NAME-state-store
Availability Zones for your cluster to live in
This will determine which AWS availability zones your cluster will live in. The master nodes in a kops cluster run both the kubernetes controller functions as well as the etcd distributed key value store. As per the FAQ, it makes sense to use 3 zones to provide robustness, high-availability and avoid split brain with your etcd cluster in the unlikely event of an AWS availability zone failure. In this example cluster, we spread both the masters and nodes across 3 different availability zones. You can change the zones specified below if you want to run etcd in a different region.
export ZONES="us-east-1a,us-east-1b,us-east-1c"
export REGION=$(echo $ZONES | awk -F, '{ print $1 }' | sed 's/-/_/g' | sed 's/.$//')
CoreOS image to use
Kops supports many different operating systems (RedHat, Centos, Ubuntu, Debian etc), however we’re big fans of Container Linux from CoreOS. This will enable you to find the latest CoreOS AMI available in your AWS region:
export IMAGE=$(curl -s https://coreos.com/dist/aws/aws-stable.json|sed 's/-/_/g'|jq '.'$REGION'.hvm'|sed 's/_/-/g' | sed 's/\"//g')
Generate the ssh keypair to use
By default kops will use ~/.ssh/id_rsa.pub
as the public key that’s allowed to login to the nodes within the cluster. This isn’t always ideal, so you can instead generate a new keypair and set kops to allow login using it:
ssh-keygen -t rsa -f $NAME.key -N ''
export PUBKEY="$NAME.key.pub"
Choose your kubernetes version
Kops will ensure that this is the version of kubernetes that is deployed, so it obviously must be a valid choice. You’ll choose an older version 1.7.2 so that you can upgrade this later
export KUBEVER="1.7.2"
You can test that your version is valid by downloading the same version of kubectl from the URL below (where ARCH=linux
or ARCH=darwin
)
https://storage.googleapis.com/kubernetes-release/release/v$KUBEVER/bin/$ARCH/amd64/kubectl eg. https://storage.googleapis.com/kubernetes-release/release/v1.7.2/bin/linux/amd64/kubectl
DNS Zones
If you have more than one DNS zone within your AWS account, it’s best to find the correct zone you’d like to use and then add that to the creation command e.g
--dns-zone=Z266PQZ112373 \
Cluster Creation
Now you’re ready to use kops create cluster
to create your cluster. There are quite a few parameters we’ve passed to the command so let’s go over a few of them. We’re using flannel for our Container Network Interface (CNI) layer, but you can use any of the providers that Kubernetes supports. You can change the --node-count
if you want more nodes, and you can also change the instance type you’d like to use for your masters and nodes respectively by passing them to --master-size
and --node-size
. We’re choosing to keep things small below so we don’t blow out any budgets! We’ve also specified --authorization RBAC
so that our cluster has Role-Based Access Control, which is a must for any enterprise-grade cluster. You should also create a bastion jump box so that you can easily and securely get into and out of your cluster. We’ve done that with the --bastion
argument.
This command will create your new cluster based on the choices you have made above. If you need to use a custom DNS zone, remember to add the --dns-zone
argument and parameter.
kops create cluster --topology private \ --zones $ZONES \ --master-zones $ZONES \ --networking flannel \ --node-count 2 \ --master-size t2.small \ --node-size t2.medium \ --image $IMAGE \ --kubernetes-version $KUBEVER \ --api-loadbalancer-type public \ --admin-access 0.0.0.0/0 \ --authorization RBAC \ --ssh-public-key $PUBKEY \ --cloud aws \ --bastion \ --name ${NAME} \ --yes
Sit back and give kops a few minutes to create your cluster and you should be good to go! There are a number of ways you can test and validate your cluster. First off, let’s get kops to validate the cluster:
kops validate cluster
This will give you a basic overview at the machine level of what your cluster looks like:
$ kops validate cluster Validating cluster kops-cluster-a.connect.cd INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS bastions Bastion t2.micro 1 1 utility-us-east-1a,utility-us-east-1b,utility-us-east-1c master-us-east-1a Master t2.small 1 1 us-east-1a master-us-east-1b Master t2.small 1 1 us-east-1b master-us-east-1c Master t2.small 1 1 us-east-1c nodes Node t2.medium 2 2 us-east-1a,us-east-1b,us-east-1c NODE STATUS NAME ROLE READY ip-172-20-32-241.ec2.internal master True ip-172-20-36-145.ec2.internal node True ip-172-20-83-199.ec2.internal master True ip-172-20-88-2.ec2.internal node True ip-172-20-98-109.ec2.internal master True
Or, since kops will also automatically populate your ~/.kube/config
file with a new configuration context for your cluster, you can also use kubectl
to view your cluster:
$ kubectl get nodes -o wide NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION ip-172-20-32-241.ec2.internal Ready 29m v1.7.2 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-36-145.ec2.internal Ready 28m v1.7.2 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-83-199.ec2.internal Ready 29m v1.7.2 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-88-2.ec2.internal Ready 28m v1.7.2 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-98-109.ec2.internal Ready 28m v1.7.2 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos
You can try logging into your bastion using ssh
and the keypair that you created earlier. As this is CoreOS that we are trying to log into, the user name is ‘core’:
$ chmod 600 $NAME.key $ ssh -i $NAME.key core@bastion.$NAME Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'bastion.kops-cluster-a.connect.cd,52.21.84.129' (ECDSA) to the list of known hosts. Container Linux by CoreOS stable (1465.8.0) core@ip-172-20-10-163 ~ $
As we chose --topology private
when we created our cluster, this means that none of our CoreOS instances will have a public IP. When you choose a private topology along with the --bastion
option, kops doesn’t assign a public IP to your bastion server; instead it creates an ELB that passes traffic through to your bastion on port 22 as well as a DNS alias which points to that ELB. This means that your ssh sessions will be at the mercy of the ELB’s idle timeout, so you may need to adjust this to suit your needs. You’ll also need to copy the same private key to the bastion, so you can use it to jump onto any of the other CoreOS instances within your cluster.
Controlling the cluster
You can edit the values in your cluster at any stage after the initial creation, by using the edit cluster
command. This will download the current cluster state from the S3 bucket that you’ve previously defined and will open it in a vim editor session for you to edit. Let’s try that out so we can update our Kubernetes version to 1.7.6 and also add idleTimeoutSeconds: 1200
under the bastion section:
$ kops edit cluster $NAME # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: kops/v1alpha2 kind: Cluster metadata: creationTimestamp: 2017-10-01T18:45:34Z name: kops-cluster-a.connect.cd spec: api: loadBalancer: type: Public authorization: rbac: {} channel: stable cloudProvider: aws configBase: s3://kops-cluster-a.connect.cd-state-store/kops-cluster-a.connect.cd etcdClusters: - etcdMembers: - instanceGroup: master-us-east-1a name: a - instanceGroup: master-us-east-1b name: b - instanceGroup: master-us-east-1c name: c name: main - etcdMembers: - instanceGroup: master-us-east-1a name: a - instanceGroup: master-us-east-1b name: b - instanceGroup: master-us-east-1c name: c name: events kubernetesApiAccess: - 0.0.0.0/0 kubernetesVersion: 1.7.6 masterInternalName: api.internal.kops-cluster-a.connect.cd masterPublicName: api.kops-cluster-a.connect.cd networkCIDR: 172.20.0.0/16 networking: flannel: {} nonMasqueradeCIDR: 100.64.0.0/10 sshAccess: - 0.0.0.0/0 subnets: - cidr: 172.20.32.0/19 name: us-east-1a type: Private zone: us-east-1a - cidr: 172.20.64.0/19 name: us-east-1b type: Private zone: us-east-1b - cidr: 172.20.96.0/19 name: us-east-1c type: Private zone: us-east-1c - cidr: 172.20.0.0/22 name: utility-us-east-1a type: Utility zone: us-east-1a - cidr: 172.20.4.0/22 name: utility-us-east-1b type: Utility zone: us-east-1b - cidr: 172.20.8.0/22 name: utility-us-east-1c type: Utility zone: us-east-1c topology: bastion: bastionPublicName: bastion.kops-cluster-a.connect.cd idleTimeoutSeconds: 1200 dns: type: Public masters: private nodes: private
!wq
means you can now update your cluster. Note that if you make a mistake with your syntax, it will usually show an error when exiting the edit function.Using the update command shows you what’s going to be changed and you can see references to your version change as well as the new IdleTimeout
changing from the default 300 seconds to 1200:
$ kops update cluster $NAME ....... + - ee007f4d30a9f5002a7e4e7ea4ae446b34a174cf@https://storage.googleapis.com/kubernetes-release/release/v1.7.6/bin/linux/amd64/kubelet - - bad424eee321f4c9b2b800d44de2e1789843da19@https://storage.googleapis.com/kubernetes-release/release/v1.7.2/bin/linux/amd64/kubelet ...... LoadBalancer/bastion.kops-cluster-a.connect.cd Lifecycle <nil> -> Sync ConnectionSettings {"IdleTimeout":300} -> {"IdleTimeout":1200} Must specify --yes to apply changes
You must specify the --yes
argument to the update command for the changes to be implemented:
$ kops update cluster $NAME --yes I1002 08:12:50.709868 34486 executor.go:91] Tasks: 0 done / 119 total; 42 can run I1002 08:12:52.354857 34486 executor.go:91] Tasks: 42 done / 119 total; 26 can run I1002 08:12:53.528533 34486 executor.go:91] Tasks: 68 done / 119 total; 34 can run I1002 08:12:57.824783 34486 executor.go:91] Tasks: 102 done / 119 total; 10 can run I1002 08:12:58.706678 34486 dnsname.go:110] AliasTarget for "bastion.kops-cluster-a.connect.cd." is "bastion-kops-cluster-a-cq4ep0-326636151.us-east-1.elb.amazonaws.com." I1002 08:12:58.707416 34486 dnsname.go:110] AliasTarget for "api.kops-cluster-a.connect.cd." is "api-kops-cluster-a-connect-cd-4mnc52-1446777117.us-east-1.elb.amazonaws.com." I1002 08:12:59.544597 34486 executor.go:91] Tasks: 112 done / 119 total; 7 can run I1002 08:13:00.568075 34486 executor.go:91] Tasks: 119 done / 119 total; 0 can run I1002 08:13:00.568141 34486 dns.go:152] Pre-creating DNS records I1002 08:13:02.447421 34486 update_cluster.go:247] Exporting kubecfg for cluster Kops has set your kubectl context to kops-cluster-a.connect.cd Cluster changes have been applied to the cloud. Changes may require instances to restart: kops rolling-update cluster
As you’ll see from the output above, you’ll need to run the kops rolling-update cluster
command to update your cluster. If you run this without the --yes
argument it will show you which components will be updated:
$ kops rolling-update cluster Using cluster from kubectl context:kops-cluster-a.connect.cd NAME STATUS NEEDUPDATE READY MIN MAX NODES bastions Ready 0 1 1 1 0 master-us-east-1a NeedsUpdate 1 0 1 1 1 master-us-east-1b NeedsUpdate 1 0 1 1 1 master-us-east-1c NeedsUpdate 1 0 1 1 1 nodes NeedsUpdate 2 0 2 2 2 Must specify --yes to rolling-update.
This looks good! Now add the --yes
argument to update the cluster. Kops will roll out the update one master or node at a time ensuring that the cluster is always available during this upgrade, so you won’t have any downtime:
$ kops rolling-update cluster --yes Using cluster from kubectl context: kops-cluster-a.connect.cd NAME STATUS NEEDUPDATE READY MIN MAX NODES bastions Ready 0 1 1 1 0 master-us-east-1a NeedsUpdate 1 0 1 1 1 master-us-east-1b NeedsUpdate 1 0 1 1 1 master-us-east-1c NeedsUpdate 1 0 1 1 1 nodes NeedsUpdate 2 0 2 2 2 I1002 08:18:08.382068 35110 instancegroups.go:350] Stopping instance "i-0e1f0494b9a6b8f96", node "ip-172-20-58-2.ec2.internal", in AWS ASG "master-us-east-1a.masters.kops-cluster-a.connect.cd". I1002 08:23:08.735138 35110 instancegroups.go:350] Stopping instance "i-06ccd8ab7b738c46d", node "ip-172-20-95-238.ec2.internal", in AWS ASG "master-us-east-1b.masters.kops-cluster-a.connect.cd". I1002 08:28:09.957542 35110 instancegroups.go:350] Stopping instance "i-0188c66cb462b9d5b", node "ip-172-20-121-101.ec2.internal", in AWS ASG "master-us-east-1c.masters.kops-cluster-a.connect.cd". I1002 08:33:11.424412 35110 instancegroups.go:350] Stopping instance "i-0566513d1cea62aa1", node "ip-172-20-86-219.ec2.internal", in AWS ASG "nodes.kops-cluster-a.connect.cd". I1002 08:35:12.577605 35110 instancegroups.go:350] Stopping instance "i-0584346a492c3364a", node "ip-172-20-50-34.ec2.internal", in AWS ASG "nodes.kops-cluster-a.connect.cd". I1002 08:37:13.786625 35110 rollingupdate.go:174] Rolling update completed!
Your new Kubernetes version should have been implemented – great stuff!
$ kubectl get nodes -o wide NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION ip-172-20-105-199.ec2.internal Ready 7m v1.7.6 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-123-192.ec2.internal Ready 3m v1.7.6 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-55-106.ec2.internal Ready 18m v1.7.6 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-80-30.ec2.internal Ready 1m v1.7.6 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos ip-172-20-94-168.ec2.internal Ready 12m v1.7.6 <none> Container Linux by CoreOS 1465.8.0 (Ladybug) 4.12.14-coreos
Controlling instance groups
Kops also has the concept of instance groups, which are groups of machines that have similar functions. When using kops on AWS, these instance groups map to autoscaling groups. You can view these groups as follows:
$ kops get ig Using cluster from kubectl context: kops-cluster-a.connect.cd NAME ROLE MACHINETYPE MIN MAX SUBNETS bastions Bastion t2.micro 1 1 utility-us-east-1a,utility-us-east-1b,utility-us-east-1c master-us-east-1a Master t2.small 1 1 us-east-1a master-us-east-1b Master t2.small 1 1 us-east-1b master-us-east-1c Master t2.small 1 1 us-east-1c nodes Node t2.medium 2 2 us-east-1a,us-east-1b,us-east-1c
You can also change the details of each instance group using the edit
command, for example to add a new node we would edit the node group and increase our minSize
of the group from 2 to 3 .
Once again you’ll be presented with a vim session for editing:
$ kops edit ig nodes Using cluster from kubectl context: kops-cluster-a.connect.cd apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2017-10-01T18:45:36Z labels: kops.k8s.io/cluster: kops-cluster-a.connect.cd name: nodes spec: image: ami-e2d33d98 machineType: t2.medium maxSize: 4 minSize: 3 role: Node subnets: - us-east-1a - us-east-1b - us-east-1c
You’ll need to update the cluster again using the --yes
argument to add your new node:
$ kops update cluster --yes Using cluster from kubectl context: kops-cluster-a.connect.cd I1002 08:49:28.910484 37911 executor.go:91] Tasks: 0 done / 119 total; 42 can run I1002 08:49:30.608112 37911 executor.go:91] Tasks: 42 done / 119 total; 26 can run I1002 08:49:31.992969 37911 executor.go:91] Tasks: 68 done / 119 total; 34 can run I1002 08:49:34.362580 37911 executor.go:91] Tasks: 102 done / 119 total; 10 can run I1002 08:49:34.609685 37911 dnsname.go:110] AliasTarget for "api.kops-gdt.clearpoint.nz." is "api-kops-cluster-a-connect-cd-4mnc52-1446777117.us-east-1.elb.amazonaws.com." I1002 08:49:34.849999 37911 dnsname.go:110] AliasTarget for "bastion.kops-gdt.clearpoint.nz." is "bastion-kops-cluster-a-cq4ep0-326636151.us-east-1.elb.amazonaws.com." I1002 08:49:35.804106 37911 executor.go:91] Tasks: 112 done / 119 total; 7 can run I1002 08:49:36.915647 37911 executor.go:91] Tasks: 119 done / 119 total; 0 can run I1002 08:49:36.915789 37911 dns.go:152] Pre-creating DNS records I1002 08:49:38.802101 37911 update_cluster.go:247] Exporting kubecfg for cluster Kops has set your kubectl context to kops-cluster-a.connect.cd Cluster changes have been applied to the cloud. Changes may require instances to restart: kops rolling-update cluster
Even though it says that you may need a rolling upgrade, to resize an instance group you don’t need to perform this step. Give it a couple of minutes and then check your cluster again and you’ll see your third node:
$ kops validate cluster Using cluster from kubectl context: kops-cluster-a.connect.cd Validating cluster kops-cluster-a.connect.cd INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS bastions Bastion t2.micro 1 1 utility-us-east-1a,utility-us-east-1b,utility-us-east-1c master-us-east-1a Master t2.small 1 1 us-east-1a master-us-east-1b Master t2.small 1 1 us-east-1b master-us-east-1c Master t2.small 1 1 us-east-1c nodes Node t2.medium 3 4 us-east-1a,us-east-1b,us-east-1c NODE STATUS NAME ROLE READY ip-172-20-105-199.ec2.internal master True ip-172-20-123-192.ec2.internal node True ip-172-20-33-127.ec2.internal node True ip-172-20-55-106.ec2.internal master True ip-172-20-80-30.ec2.internal node True ip-172-20-94-168.ec2.internal master True
Deleting the cluster
Once you’ve finished with your cluster, deleting it is very simple so take care not to do this by mistake!
$ kops delete cluster $NAME --yes
Other Options
Kops is quite configurable and there many different options and architectures you can choose. The command line help is very useful, e.g: kops create cluster --help
which gives you a good description of the various different options. If you need more documentation on using kops it can be found here.
More Resources
There’s plenty of additional resources and reading available online so that you can better familiarise yourself with kops. Here’s a small collection of official resources you might find useful:
- The official kops repository
- The official kops ‘getting started’ guide
- AWS Compute Blog article on kops
We hope you’ve found this useful! Stay tuned for our next post…