Despite coming a long way since its initial public release in 2014, managing Kubernetes clusters is still no easy feat. Different organizations have different workload needs and considerations, and one of the most challenging facets of managing a cluster is optimizing cost while not compromising performance and reliability. One rising star in this regard is Karpenter, an open-source project aiming for the (GitHub) stars, bringing node provisioning automation to the masses!

Note: The following blog post pertains to Karpenter v0.33.0, which has implemented a complete overhaul of Karpenter.

Initially seeing light in 2021, and graduating from alpha towards the end of 2023, Karpenter was (and still is) meticulously crafted over at AWS, aiming to enable organizations to dynamically provision and de-provision Kubernetes nodes for their clusters.
Currently, Karpenter only supports EKS clusters running on AWS, but there are plans to have it support AKS and GKE clusters as well, making it a viable multi-cloud solution.

How Karpenter Works

Karpenter works as an operator in the cluster, periodically checking the cluster’s API for unschedulable pods. When it finds such pods, it checks if it can pair them with a NodePool; a NodePool is a custom resource you can create, which outlines a set of rules and conditions under which to create additional nodes for the cluster. If Karpenter finds a match, it creates a NodeClaim, and tries to provision a new EC2 instance to be used as the new node. Karpenter is always on the lookout for discrepancies, and once the new node is no longer needed, Karpenter terminates it.

Where to start?

Well, that depends. If you’re starting from scratch, obviously the first thing you’d need is a Kubernetes cluster. You can follow Karpenter’s own guide here for basic instructions on how to easily provision an EKS cluster and prepare it for Karpenter installation. The method in their guide leverages CloudFormation and eksctl to make the magic happen.
If you already have an EKS cluster up and running what you need to do is create 2 AWS roles (one for the Karpenter controller and one of the instances that it will create), and add the relevant entries to the aws-auth configmap under the kube-system namespace, you can read about that more here.
Also, since Karpenter is still somewhat of a work-in-progress and new functionality is being added all the time, I will not include hard and fast permissions that should be given to the roles you create, instead, I advise you to take a look at the cloudformation.yaml of the version you want to deploy over at this link:

https://karpenter.sh/<KarpenterVersion>/getting-started/getting-started-with-karpenter/cloudformation.yaml

Once our cluster is up and running, we will need to install the Karpenter operator, which can be easily installed using Helm. When installing the chart, Karpenter’s CRDs will be deployed to the cluster, as well as a the Karpenter Controller pods. Let’s look at a basic values.yaml file for the Helm chart.

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<accountID>:role/<karpenterRoleName>
settings:
  clusterName: <clusterName>
  clusterEndpoint: https://<clusterEndpoint>.eks.amazonaws.com

Replace the placeholders with your values, obviously. Karpenter assumes the role of the <karpenterRoleName> service account, and uses it to create EC2 instances in AWS. It uses the clusterName and clusterEndpoint values to join the EC2 instance as a Kubernetes node to the cluster.

You can get the clusterEndpoint values by running the command:

aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.endpoint" --output text

Getting down to business

Once our Karpenter Helm chart is deployed, it’s time to get familiar with the NodePool and EC2NodeClass CRs. If you’ve been using Karpenter in the past, you might be familiar with a CR called provisioner, it’s been superseded by NodePool and deprecated, same for AWSNodeTemplate, it’s been replaced by EC2NodeClass. Also, congratulations to Karpenter for graduating to Beta!

Let’s get back to talking about the NodePool and EC2NodeClass CRs, which are, in my opinion, 2 sides of the same coin.
the NodeClass CR lets you fine-tune AWS-specific settings, such as which subnets the nodes will be created in, any mapped block devices, security groups, AMI families, and many more options you can control. An EC2NodeClass is, as we said, AWS-specific; Once Karpenter goes multi-cloud, there will probably be GCP and Azure CRs as well.

A NodePool is a more Kubernetes-centered representation of the nodes that should be created.
For example – should the nodes have any taints? labels? What VM sizes are allowed to be provisioned? Are we OK with spot instances or not? Stuff like that.

Let’s look at an example for a very basic pair of these two buddies:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: <karpenterProfileInstanceRole>
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: <clusterName>
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: <clusterName>
  tags:
    karpenter.sh/discovery: <clusterName>
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
 name: default-nodepool
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["t3"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["medium","large", "xlarge"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        name: default
  limits:
    cpu: "80"
    memory: "320Gi"

This is a very stripped down version of what you can do with these, but even so, we can get some insight into how they compliment each other.
In our EC2NodeClass, we’re declaring that NodePools assigned to this class will run an AL2 image, use the role we have created, and be part of the subnets and security groups that are tagged with karpenter.sh/discovery: <clusterName> (this one is a convention, you can use whatever custom tag you’d like and even specify multiple tags to be able to be as specific as possible with your class definitions).

In our NodePool, we specify that its class is the default class we have created, as well as imposing further fine-tuned controls for the nodes that will be created – They should be linux/amd64, be on-demand instances (rather than spot), be of specific sizes and machine families, and finally – no matter how many nodes will be spun up by through this NodePool, they should not collectively have more than 80 CPUs or 320GiB of RAM collectively.

We can create as many NodePools and EC2NodeClasses as we’d like, and give them different parameters, labels, and taints, to accommodate our organization’s specific needs and wants.


Let’s see some examples from a client of mine:

The client in question uses self-hosted GitLab runners for their build process. Up until now, they had some managed node pools just idling about and waiting for pipelines to be scheduled on them. That presented 2 issues that Karpenter was perfect for solving:

  • Idle compute was constantly burning money in the cloud
  • Compute sizing was uniform and inflexible

By leveraging Karpenter, and creating a bunch of different NodePools for different pipelines, we were able to cut costs by more than half! Here’s an example for one such NodePool, where we even have some taints and a label that will correspond to the build pods’ NodeSelector.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
 name: gitlab-auto-nodepool
spec:
  template:
    metadata:
      labels:
        workload-type: gitlab-runners-auto
    spec:
      taints:
        - effect: NoSchedule
          key: dedicated
          value: gitlab-runner-auto
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["t3"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["2xlarge"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]

      nodeClassRef:
        name: default
  limits:
    cpu: "32"
    memory: "64Gi"
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h

And here’s the Gitlab runner configuration

[[runners]]                                           
  [runners.kubernetes]                                
    namespace = gitlab"              
    image = "ubuntu:20.04"                            
    [runners.kubernetes.node_selector]                
        "workload-type" = "gitlab-runners-auto"       
    [runners.kubernetes.node_tolerations]             
        "dedicated=gitlab-runner-auto" = "NoSchedule" 

When a build pipeline is ran, the Gitlab runners try to spin up a new pod for the pipeline. the pod will be in the pending state. Karpenter will look at this pod and go “Oh, there’s a pending pod, and it’s tolerant of the taint dedicated=gitlab-runner-auto, and must be scheduled on a node with the label workload-type=gitlab-runners-auto. Karpenter will then look at its NodePool definitions and try to find one that matches the criteria. It will then launch a new NodeClaim to provision the new node according to the spec, and the pod will schedule on it. Once the node is no longer needed, it will be terminated and deregistered from the cluster.

I hope you found Karpenter as fascinating as I have, it’s been fun to implement on my clients’ infrastructure and has saved them large sums of money that was essentially wasted on overprovisioning.

About the Author

Orel Fichman

Tech Blogger, DevOps Engineer, and Microsoft Certified Trainer

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *

Newsletter

Categories