Despite coming a long way since its initial public release in 2014, managing Kubernetes clusters is still no easy feat. Different organizations have different workload needs and considerations, and one of the most challenging facets of managing a cluster is optimizing cost while not compromising performance and reliability. One rising star in this regard is Karpenter, an open-source project aiming for the (GitHub) stars, bringing node provisioning automation to the masses!
Note: The following blog post pertains to Karpenter v0.33.0, which has implemented a complete overhaul of Karpenter.
Initially seeing light in 2021, and graduating from alpha towards the end of 2023, Karpenter was (and still is) meticulously crafted over at AWS, aiming to enable organizations to dynamically provision and de-provision Kubernetes nodes for their clusters.
Currently, Karpenter only supports EKS clusters running on AWS, but there are plans to have it support AKS and GKE clusters as well, making it a viable multi-cloud solution.
How Karpenter Works
Karpenter works as an operator in the cluster, periodically checking the cluster’s API for unschedulable pods. When it finds such pods, it checks if it can pair them with a NodePool
; a NodePool
is a custom resource you can create, which outlines a set of rules and conditions under which to create additional nodes for the cluster. If Karpenter finds a match, it creates a NodeClaim
, and tries to provision a new EC2 instance to be used as the new node. Karpenter is always on the lookout for discrepancies, and once the new node is no longer needed, Karpenter terminates it.
Where to start?
Well, that depends. If you’re starting from scratch, obviously the first thing you’d need is a Kubernetes cluster. You can follow Karpenter’s own guide here for basic instructions on how to easily provision an EKS cluster and prepare it for Karpenter installation. The method in their guide leverages CloudFormation and eksctl to make the magic happen.
If you already have an EKS cluster up and running what you need to do is create 2 AWS roles (one for the Karpenter controller and one of the instances that it will create), and add the relevant entries to the aws-auth
configmap under the kube-system
namespace, you can read about that more here.
Also, since Karpenter is still somewhat of a work-in-progress and new functionality is being added all the time, I will not include hard and fast permissions that should be given to the roles you create, instead, I advise you to take a look at the cloudformation.yaml
of the version you want to deploy over at this link:
https://karpenter.sh/<KarpenterVersion>/getting-started/getting-started-with-karpenter/cloudformation.yaml
Once our cluster is up and running, we will need to install the Karpenter operator, which can be easily installed using Helm. When installing the chart, Karpenter’s CRDs will be deployed to the cluster, as well as a the Karpenter Controller pods. Let’s look at a basic values.yaml
file for the Helm chart.
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<accountID>:role/<karpenterRoleName>
settings:
clusterName: <clusterName>
clusterEndpoint: https://<clusterEndpoint>.eks.amazonaws.com
Replace the placeholders with your values, obviously. Karpenter assumes the role of the <karpenterRoleName>
service account, and uses it to create EC2 instances in AWS. It uses the clusterName
and clusterEndpoint
values to join the EC2 instance as a Kubernetes node to the cluster.
You can get the clusterEndpoint
values by running the command:
aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.endpoint" --output text
Getting down to business
Once our Karpenter Helm chart is deployed, it’s time to get familiar with the NodePool
and EC2NodeClass
CRs. If you’ve been using Karpenter in the past, you might be familiar with a CR called provisioner
, it’s been superseded by NodePool
and deprecated, same for AWSNodeTemplate
, it’s been replaced by EC2NodeClass
. Also, congratulations to Karpenter for graduating to Beta!
Let’s get back to talking about the NodePool
and EC2NodeClass
CRs, which are, in my opinion, 2 sides of the same coin.
the NodeClass CR lets you fine-tune AWS-specific settings, such as which subnets the nodes will be created in, any mapped block devices, security groups, AMI families, and many more options you can control. An EC2NodeClass
is, as we said, AWS-specific; Once Karpenter goes multi-cloud, there will probably be GCP and Azure CRs as well.
A NodePool
is a more Kubernetes-centered representation of the nodes that should be created.
For example – should the nodes have any taints? labels? What VM sizes are allowed to be provisioned? Are we OK with spot instances or not? Stuff like that.
Let’s look at an example for a very basic pair of these two buddies:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: <karpenterProfileInstanceRole>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <clusterName>
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: <clusterName>
tags:
karpenter.sh/discovery: <clusterName>
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default-nodepool
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["t3"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium","large", "xlarge"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
name: default
limits:
cpu: "80"
memory: "320Gi"
This is a very stripped down version of what you can do with these, but even so, we can get some insight into how they compliment each other.
In our EC2NodeClass
, we’re declaring that NodePools
assigned to this class will run an AL2 image, use the role we have created, and be part of the subnets and security groups that are tagged with karpenter.sh/discovery: <clusterName>
(this one is a convention, you can use whatever custom tag you’d like and even specify multiple tags to be able to be as specific as possible with your class definitions).
In our NodePool
, we specify that its class is the default
class we have created, as well as imposing further fine-tuned controls for the nodes that will be created – They should be linux/amd64
, be on-demand instances (rather than spot), be of specific sizes and machine families, and finally – no matter how many nodes will be spun up by through this NodePool
, they should not collectively have more than 80 CPUs or 320GiB of RAM collectively.
We can create as many NodePools
and EC2NodeClasses
as we’d like, and give them different parameters, labels, and taints, to accommodate our organization’s specific needs and wants.
Let’s see some examples from a client of mine:
The client in question uses self-hosted GitLab runners for their build process. Up until now, they had some managed node pools just idling about and waiting for pipelines to be scheduled on them. That presented 2 issues that Karpenter was perfect for solving:
- Idle compute was constantly burning money in the cloud
- Compute sizing was uniform and inflexible
By leveraging Karpenter, and creating a bunch of different NodePools
for different pipelines, we were able to cut costs by more than half! Here’s an example for one such NodePool
, where we even have some taints and a label that will correspond to the build pods’ NodeSelector
.
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gitlab-auto-nodepool
spec:
template:
metadata:
labels:
workload-type: gitlab-runners-auto
spec:
taints:
- effect: NoSchedule
key: dedicated
value: gitlab-runner-auto
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["t3"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["2xlarge"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
name: default
limits:
cpu: "32"
memory: "64Gi"
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h
And here’s the Gitlab runner configuration
[[runners]]
[runners.kubernetes]
namespace = gitlab"
image = "ubuntu:20.04"
[runners.kubernetes.node_selector]
"workload-type" = "gitlab-runners-auto"
[runners.kubernetes.node_tolerations]
"dedicated=gitlab-runner-auto" = "NoSchedule"
When a build pipeline is ran, the Gitlab runners try to spin up a new pod for the pipeline. the pod will be in the pending
state. Karpenter will look at this pod and go “Oh, there’s a pending pod, and it’s tolerant of the taint dedicated=gitlab-runner-auto
, and must be scheduled on a node with the label workload-type=gitlab-runners-auto
. Karpenter will then look at its NodePool
definitions and try to find one that matches the criteria. It will then launch a new NodeClaim
to provision the new node according to the spec, and the pod will schedule on it. Once the node is no longer needed, it will be terminated and deregistered from the cluster.
I hope you found Karpenter as fascinating as I have, it’s been fun to implement on my clients’ infrastructure and has saved them large sums of money that was essentially wasted on overprovisioning.
No responses yet