How we manage clusters by extending the kubernetes api

Share on: linkedin copy

At BESTSELLER we run multiple Kubernetes clusters in multiple clouds, which gives us some assurance that even if one provider or one region is degraded we are still able to serve our customers. But multiple clusters, multiple clouds and multiple teams can be a bit difficult to grasp as an engineer. That is why we decided to use the extendability of the Kubernetes API to create a cluster-registry. In this post, we will cover how to combine Custom Resource Definitions (CRD) with Admission controllers to gain control of your custom Kubernetes Resources.

What is a CRD and Admission Controller

To understand CRDs, we need to understand the basic concept of resources in Kubernetes.

  • A resource is an API endpoint where you can store API objects of any kind.
  • A custom resource allows you to define your own API objects, and thus creating your own Kubernetes kind just like Deployments or Statefulsets.

In short, the Custom Resource Definition is where you define your Custom Resource that extends Kubernetes' default capabilities.

While the CRDs extend the Kubernetes functionality, Admission controllers govern and enforce how the cluster is used. They can be thought of as a gatekeeper that intercepts (authenticated) API requests and may change the request object or deny the request altogether.

As described in the Kubernetes Blog

Admission Controller Phases

There are two types of Admission controllers; validating and mutating. Mutating admission webhooks are invoked first and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, validating admission webhooks are invoked which runs logic to validate the incoming resource. In case the validation webhook rejects the request, the Kubernetes API returns a failed HTTP response to the user.

Creating our cluster specification

Let's start with the easiest part, creating our custom resource definition, in this case, our cluster specification template.

 2kind: CustomResourceDefinition
 4  name:
 6  group:
 7  scope: Namespaced
 8  names:
 9    kind: Cluster
10    plural: clusters
11    singular: cluster
12  # list of versions supported by our CustomResourceDefinition
13  versions:
14    - name: v1alpha1
15      served: true
16      storage: true
17      schema:
18        openAPIV3Schema:
19          type: object
20          required: ["spec"]
21          properties:
22            LastRun:
23              type: string
24            status:
25              enum: ["", "Active", "Deploying", "Rerun", "Upgrading", "Delete", "Deleting", "Deleted"]
26              type: string
27            # our custom fields in the resources
28            spec:
29              type: object
30              required: ["NodeCount", "Cloud"]
31              properties:
32                NodeCount:
33                  type: integer
34                ContactPerson:
35                  type: string
36                Cloud:
37                  type: string
38      # a list of additional fields to print when doing e.g. GET operation.
39      additionalPrinterColumns:
40        - jsonPath: .spec.ContactPerson
41          description: Contact Person
42          name: ContactPerson
43          type: string

in-depth details on CRDs click here

From the simplified example above we have defined a new api group and in that group our CRD is stored.

We have defined 3 fields in our clusters specs, Node Count, Cloud and Contact Person where the first two are required.

Implementing the actual CRD is as easy as:

1kubectl apply -f ourcrd.yaml

Time to create our first cluster object! More YAML coming up.

1apiVersion: ""
2kind: Cluster
4  name: destinationaarhus-techblog
6  ContactPerson: Peter Brøndum
7  Cloud: GCP
8  NodeCount: 3

Apply it to our cluster:

1kubectl apply -f firstcluster.yaml

Now we can get our clusters with kubectl just as any other Kubernetes kind:

1> kubectl get clusters
2NAME                         CONTACTPERSON
3destinationaarhus-techblog   Peter Brøndum

With our cluster spec and storage in place, it is time for the fun part.

The Admission Controller

The admission controller, in this case a mutating webhook, consists of two elements.

  1. A MutatingWebhookConfiguration, which defines which resources is subject to mutation and which mutating service to call.
  2. An admission webhook server, which does the mutation.

First up is the MutatingWebhookConfiguration. We can divide this into two blocks. The first is clientConfig. Here we configure which admission webhook service to call (can be an external service as well). Next is the rules, where we specify that mutation can only happen on Create and Update requests to the Kubernetes API and only on our Cluster resourcers.

 2kind: MutatingWebhookConfiguration
 4  name: cluster-mutate
 6- admissionReviewVersions:
 7  - v1beta1
 8  clientConfig:
 9    # As webhooks can only be called over HTTPS this should be the actual caBundle
10    caBundle: "Ci0tLS0tQk...<`caBundle` is a PEM encoded CA bundle which will be used to validate the webhook's server certificate.>...tLS0K"
11    # the internal k8s service to call
12    service:
13      name: cluster-mutate
14      namespace: default
15      path: "/mutate"
16  failurePolicy: Fail
17  name: cluster-mutate.default.svc
18  rules:
19  - apiGroups:
20    - ""
21    apiVersions:
22    - v1alpha1
23    operations:
24    - "CREATE"
25    - "UPDATE"
26    resources:
27    - "clusters"
28  sideEffects: None
29  timeoutSeconds: 30

Kubernetes will only accept a ssl encrypted endpoint, which i will not cover this in this article, but we are in luck, other people have made simple scripts that can help us e.g giantswarm

The webhook

With this in place, we need to create the actual mutation logic.

Before we deep dive into the code, I have chosen to write this in Go as it has a native client for Kubernetes. That being said, you could do this in the language of your choosing. The only requirement is to create a web server that serves a TLS endpoint and accepts and responds with JSON.

This will be a simplified example, and I have tried to squeeze everything into one file. In essence what we are aiming at is to:

  1. Recieve a JSON request, in Kubernetes terms an AdmissionReview.
  2. Do our mutation logic.
  3. Return a JSON response, again in the format of an AdmissionReview, which tells Kubernetes what to mutate.

Yes! you are correct, it is actually Kubernetes that does the mutation.

Simplified flow of our webhook.

To the code!

  1package main
  3import (
  4	"encoding/json"
  5	"fmt"
  6	"io/ioutil"
  7	"log"
  8	"net/http"
  9	"os"
 10	"time"
 12	""
 13	""
 14	""
 15	metav1 ""
 18// ClusterSpec the crd spec
 19type ClusterSpec struct {
 20	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
 22	APIVersion string    `json:"apiVersion"`
 23	Kind       string    `json:"kind"`
 24	Status     string    `json:"status"`
 25	LastRun    time.Time `json:"LastRun,omitempty"`
 26	Spec       struct {
 27		ContactPerson string `json:"ContactPerson"`
 28		NodeCount     int64  `json:"NodeCount"`
 29		Cloud         string `json:"Cloud"`
 30	} `json:"spec"`
 33func mutate(w http.ResponseWriter, r *http.Request) {
 34	// extract body, it would be beneficial to check if it is empty :)
 35	body, err := ioutil.ReadAll(r.Body)
 36	if err != nil {
 37		log.Printf("could not read body: %v", err)
 38		http.Error(w, fmt.Sprintf("could not read body: %v", err), http.StatusInternalServerError)
 39		return
 40	}
 41	defer r.Body.Close()
 43	// unmarshal body into AdmissionReview struct
 44	arRequest := v1beta1.AdmissionReview{}
 45	if err := json.Unmarshal(body, &arRequest); err != nil {
 46		log.Printf("incorrect body: %v", err)
 47		http.Error(w, fmt.Sprintf("incorrect body: %v", err), http.StatusInternalServerError)
 48		return
 49	}
 51	// unmarshal cluster
 52	cluster := ClusterSpec{}
 53	if err := json.Unmarshal(arRequest.Request.Object.Raw, &cluster); err != nil {
 54		log.Printf("error deserializing cluster: %v", err)
 55		http.Error(w, fmt.Sprintf("error deserializing cluster: %v", err), http.StatusInternalServerError)
 56		return
 57	}
 59	// Lets mutate! if no contact is defined i will be the contact. Which irl i would quickly regret.
 60	if cluster.Spec.ContactPerson == "" {
 61		cluster.Spec.ContactPerson = "Peter Brøndum"
 62		log.Println("No contact, Mutate me!")
 63	}
 65	// response options
 66	pT := v1beta1.PatchTypeJSONPatch
 67	arResponse := v1beta1.AdmissionReview{
 68		Response: &v1beta1.AdmissionResponse{
 69			Allowed:   true,
 70			UID:       cluster.UID,
 71			PatchType: &pT,
 72			Result: &metav1.Status{
 73				Message: "success",
 74			},
 75		},
 76	}
 78	// okay so this is in truth the actaul mutation, as you can see it is kubernetes
 79	// that does the mutation, we just tell it what it should do for us!
 80	// this is why we use JSONPatch as well.
 81	p := []map[string]string{}
 82	p = append(p, map[string]string{
 83		"op":    "replace",
 84		"path":  "/spec/ContactPerson",
 85		"value": cluster.Spec.ContactPerson,
 86	})
 88	arResponse.Response.Patch, _ = json.Marshal(p)
 90	responseBody, err := json.Marshal(arResponse)
 91	if err != nil {
 92		log.Printf("can't encode response: %v", err)
 93		http.Error(w, fmt.Sprintf("can't encode response: %v", err), http.StatusInternalServerError)
 94		return
 95	}
 97	w.WriteHeader(http.StatusOK)
 98	w.Write(responseBody)
101func main() {
102	fmt.Println("Cluster Contact Mutater has started")
104	// define http endpoints and start
105	router := mux.NewRouter()
106	router.HandleFunc("/mutate", mutate)
108	loggedRouter := handlers.LoggingHandler(os.Stdout, router)
109	log.Fatal(http.ListenAndServeTLS(":443", "./certs/crt.pem", "./certs/key.pem", loggedRouter))

In short, the example creates a single HTTP endpoint. When called, it will unmarshal the body into our cluster specification (along with default Kubernetes stuff) and check if a Contact Person is present. If not, I, Peter Brøndum, will be set as a contact. Then it will marshal it back to JSON and send the response to Kubernetes. This response is used by Kubernetes to do the actual mutation.

Lets see in Action

I have deployed the Webhook and the MutatingWebhookConfiguration. Let's prepare a new cluster spec. Notice that we do not add a contact to this cluster!

1apiVersion: ""
2kind: Cluster
4  name: destinationaarhus-techblog02
6  Cloud: GCP
7  NodeCount: 3

When we apply this spec, we dont se any difference, as long as the webhook sends a status 200.

1> kubectl apply -f manifests/cluster02. created

But when we list the clusters, I am the contact.

1> kubectl get clusters
2NAME                           CONTACTPERSON
3destinationaarhus-techblog     Peter Brøndum
4destinationaarhus-techblog02   Peter Brøndum

It worked! (surprise) But let's check the logs of our webhook.

12020/10/28 12:54:31 No contact, Mutate me! - - [28/Oct/2020:12:54:31 +0000] "POST /mutate?timeout=30s HTTP/1.1" 200 214

In the above, we see that our webhook was reached when we applied the cluster and that no Contact Person was set. From there, it responded with an AdmissionReview telling Kubernetes to mutate.

Final Words

As you can see from the above examples, it is quite easy to extend Kubernetes' functionality by creating your own custom resources. Even creating custom logic and behaviour of the resources is doable. And this does not have to be custom resources, it could be used to influence other key components in the cluster.

How we use this at BESTSELLER

To be fair, there is quite a lot from our setup in BESTSELLER, I did not cover. But the basics on how we keep track of our clusters are there. Instead of assigning me as a contact on each and every cluster, which would be a pain, we call our CI/CD pipeline and mutate the status, amongst other things, on the cluster resources in Kubernetes. This way, when a cluster is changed, our CI/CD will run a bunch of jobs to setup and configure the specific cluster. When finished the pipeline updates our custom cluster resource in Kubernetes once again and mutates the status.

About the author

Peter Brøndum

My name is Peter Brøndum, I work as a Tech Lead and Scrum Master in a platform engineering team at BESTSELLER. Our main priority is building a development highway, with paved roads, lights and signs, so our colleagues can deliver value even faster. Besides working at BESTSELLER, I — amongst other things, am automating my own home, and yes, that is, of course, running on Kubernetes as well.