Cloud Bursting

Building on our previous deployment, we'll now explore a scenario that simulates a "cloud bursting" use case. This will demonstrate how EKS Hybrid Nodes can be used to handle overflow workloads or specific computational needs. We'll deploy a new workload that, like our previous example, uses nodeAffinity to prefer our hybrid nodes.

The preferredDuringSchedulingIgnoredDuringExecution strategy tells Kubernetes to prefer our Hybrid Node when scheduling but ignore that during execution. This means that when there is no more room on our single hybrid node, these pods are free to schedule elsewhere in the cluster, meaning our EC2 instances. Which is great! That gives us our cloud bursting we wanted. However, the IgnoredDuringExecution part means that when we scale back down, Kubernetes will randomly remove pods and not worry about where they are running, because that is ignored during execution. Generally speaking, Kubernetes will remove older pods first, which would be the pods running on our Hybrid Nodes. We don't want that!

We're going to deploy Kyverno, which is a policy engine for Kubernetes. Kyverno will be setup with a policy that watches for Pods that get scheduled to our hybrid node, and will add an Annotation to that running pod. The controller.kubernetes.io/pod-deletion-cost Annotation effectively tells Kubernetes to delete less expensive pods first.

Let's get to work. We'll use Helm to install Kyverno and we'll deploy the policy included below.

~$helm repo add kyverno https://kyverno.github.io/kyverno/

~$helm install kyverno kyverno/kyverno --version 3.3.7 -n kyverno --create-namespace -f ~/environment/eks-workshop/modules/networking/eks-hybrid-nodes/kyverno/values.yaml

The ClusterPolicy manifest below tells Kyverno to watch for pods that land on our EKS Hybrid Nodes instance, and adds the pod-deletion-cost annotation to them.

~/environment/eks-workshop/modules/networking/eks-hybrid-nodes/kyverno/policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: set-pod-deletion-cost
  annotations:
    policies.kyverno.io/title: Set Pod Deletion Cost
    policies.kyverno.io/category: Pod Management
    policies.kyverno.io/severity: medium
    policies.kyverno.io/description: >-
      Sets pod-deletion-cost label on nginx pods scheduled to hybrid compute nodes.
spec:
  rules:
    - name: set-deletion-cost-for-nginx-on-hybrid
      match:
        any:
          - resources:
              kinds:
                # Listen specifically for Pod/binding, which is when a Pod is scheduled to a Node
                - Pod/binding
      # Create some variables
      context:
        # node variable is populated by extracting the info from the binding request object
        - name: node
          variable:
            jmesPath: request.object.target.name
            default: ""
        # computeType variable is populated by asking the API for details about the Node the Pod was scheduled on
        - name: computeType
          apiCall:
            urlPath: "/api/v1/nodes/{{node}}"
            jmesPath: 'metadata.labels."eks.amazonaws.com/compute-type" || ''empty'''
      # preconditions allow us to use the variables we've defined and check computeType for 'hybrid'
      preconditions:
        all:
          - key: "{{ computeType }}"
            operator: Equals
            value: hybrid
      # finally we're going to modify the Pod itself and add the pod-deletion-cost annotation
      mutate:
        targets:
          - apiVersion: v1
            kind: Pod
            name: "{{ request.object.metadata.name }}"
            namespace: "{{ request.object.metadata.namespace }}"
        patchStrategicMerge:
          metadata:
            annotations:
              controller.kubernetes.io/pod-deletion-cost: "1"
      # This rule labels all the pods we've evaluated for example purposes
    - name: do-anything
      match:
        any:
          - resources:
              kinds:
                - Pod/binding
      mutate:
        targets:
          - apiVersion: v1
            kind: Pod
            name: "{{ request.object.metadata.name }}"
            namespace: "{{ request.object.metadata.namespace }}"
        patchStrategicMerge:
          metadata:
            labels:
              touched-by-kyverno: "true"

Let's apply that now.

~$kubectl apply -f ~/environment/eks-workshop/modules/networking/eks-hybrid-nodes/kyverno/policy.yaml

Before we can test our workload, we need to make wait for Kyverno to be up and running so it can enforce the policy we just set up.

~$kubectl wait --for=condition=Ready pods --all -n kyverno --timeout=2m

Now we'll deploy our sample workload. This will use the nodeAffinity rules discussed earlier to land 3 nginx pods on our hybrid node.

~$kubectl apply -f ~/environment/eks-workshop/modules/networking/eks-hybrid-nodes/deployment.yaml

~/environment/eks-workshop/modules/networking/eks-hybrid-nodes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: eks.amazonaws.com/compute-type
                    operator: In
                    values:
                      - hybrid
      containers:
        - name: nginx
          image: public.ecr.aws/nginx/nginx:1.26
          resources:
            requests:
              cpu: 200m
            limits:
              cpu: 200m
          ports:
            - containerPort: 80

After that deployment rolls out we see three nginx-deployment pods, all deployed to our hybrid node. We're using a custom output from kubectl so we can see the node and annotations all in one view. We see that Kyverno has applied our pod-deletion-cost annotation!

~$kubectl get pods -o=custom-columns='NAME:.metadata.name,NODE:.spec.nodeName,ANNOTATIONS:.metadata.annotations'

NAME                                NODE                   ANNOTATIONS

nginx-deployment-7474978d4f-9wbgw   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-fjswp   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-k2sjd   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]

Let's scale up and burst into the cloud! The nginx deployment here is requesting an unreasonable amount of CPU (200m) for demonstration purposes. This means we can fit about 8 replicas on our hybrid node. When we scale up to 15 replicas of the pod, there is no room to schedule them. Given that we are using the preferredDuringSchedulingIgnoredDuringExecution affinity policy, this means that we start with our hybrid node. Anything that is unschedulable is allowed to be scheduled elsewhere (our cloud instances).

Usually scaling would be automatic based on CPU, Memory, GPU availability, or a external factors like queue depth. Here, we're just going to force the scale up.

~$kubectl scale deployment nginx-deployment --replicas 15

Now when we run kubectl get pods, with our custom columns, we see that our extras have been deployed onto the EC2 instances attached to our workshop EKS cluster. Kyverno has applied our pod-deletion-cost annotation to all of the pods that landed on our hybrid node, and left it off of all of the Pods that landed on EC2. When we scale back down, Kubernetes will delete all the cheap Pods first, Pods that have no cost on them. Kubernetes will then see all the others as equal and the normal deletion logic kicks in. Let's see that in action now.

~$kubectl get pods -o=custom-columns='NAME:.metadata.name,NODE:.spec.nodeName,ANNOTATIONS:.metadata.annotations'

NAME                                NODE                                          ANNOTATIONS

nginx-deployment-7474978d4f-8269p   ip-10-42-108-174.us-west-2.compute.internal   <none>

nginx-deployment-7474978d4f-8f6cg   ip-10-42-163-36.us-west-2.compute.internal    <none>

nginx-deployment-7474978d4f-9wbgw   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-bjbvx   ip-10-42-154-155.us-west-2.compute.internal   <none>

nginx-deployment-7474978d4f-f55rj   ip-10-42-108-174.us-west-2.compute.internal   <none>

nginx-deployment-7474978d4f-fjswp   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-jrcsl   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-k2sjd   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-mstwv   ip-10-42-154-155.us-west-2.compute.internal   <none>

nginx-deployment-7474978d4f-q8nkj   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-smc9f   ip-10-42-163-36.us-west-2.compute.internal    <none>

nginx-deployment-7474978d4f-ss76l   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-tbzf2   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-txxlw   mi-0ebe45e33a53e04f2                          map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-wqbsd   ip-10-42-154-155.us-west-2.compute.internal   <none>

Let's scale our sample deployment back down to 3 again. We'll be left with three pods running on our Hybrid Node, which brings us back to or original state.

~$kubectl scale deployment nginx-deployment --replicas 3

Finally, just to be sure, let's make sure we're back down to 3 replicas running on our hybrid node.

~$kubectl get pods -o=custom-columns='NAME:.metadata.name,NODE:.spec.nodeName,ANNOTATIONS:.metadata.annotations'

NAME                                NODE                   ANNOTATIONS

nginx-deployment-7474978d4f-9wbgw   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-fjswp   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]

nginx-deployment-7474978d4f-k2sjd   mi-0ebe45e33a53e04f2   map[controller.kubernetes.io/pod-deletion-cost:1]