• +43 660 1453541
  • contact@germaniumhq.com

Writing BPMN Let's Encrypt Kubernetes Operators in Python I


Writing BPMN Let’s Encrypt K8s Operators in Python I

As I already mentioned, I wasn’t too happy about the currently available DNS operators out there that integrate with let’s encrypt. What I wanted was one that works right of the box, and allows me to add new Ingress objects, and magically generate their TLS certificates. I’m going to break down in a small mini-series of articles on how I ended up implementing it.

These are the articles:

  1. How the operator architecture looks like (this article),

  2. How the registration of the new certificate works, the Job later in the article,

  3. Lessons learned.

THIS IS DEPRECATED. While some things are still true - i.e. we still listen on Ingress and timer, the core implementation is not done manually anymore. See Lessons learned for the updated architectural version using event deduplication from Adhesive’s @deduplicate.

The operator itself it’s a variation of the the deduplication sample I covered before. Namely, it listens for events for all the Ingress objects, and whenever it detects that a change happened on an Ingress and starts the reconciliation:

Deduplication

Listening for Ingress is just calling the regular Kubernetes watch API:

@adhesive.message('Listen for Ingress Objects')
def message_start_event(context: adhesive.Token[Data]):
    w = watch.Watch()
    beta = client.ExtensionsV1beta1Api()

    while True:
        try:
            for event in w.stream(beta.list_ingress_for_all_namespaces):
                obj = event["object"]

                if not obj.metadata.name or not obj.metadata.namespace:
                    LOG.warn(f"Don't know how to process: {event}")
                    continue

                yield addict.Dict({
                    "event": event,
                    "id": event["object"].metadata.name,
                    "namespace": event["object"].metadata.namespace,
                    "state": "new"
                })
        except Exception as e:
            # ignore exceptions on purpose
            LOG.info(f"Failure in listen for ingress objects: {e}")
            time.sleep(1)

There is a change, though. We can have certificates that expire simply due to time passing by. That means just the Ingress state is not enough. The Ingress might have a certificate, but it might be expired, so we need again to run the reconciliation process.

We collect every hour a list of all the ingresses in the system, then immediately filter the tokens using a complex gateway:

Validate Certificate

That’s why only the execution tokens that correspond to invalid certificates pass into the deduplication:

@adhesive.message('Scan current certificates every hour.')
def message_scan_current_certificates_every_hour_(context):
    """
    We just compile a list of all the ingresses.
    """
    kubeapi = KubeApi(context.workspace)

    while True:
        try:
            time.sleep(3600)
            ingresses = kubeapi.getall(
                kind="ingress",
                namespace=KubeApi.ALL,
                filter="")

            for ingress in ingresses:
                yield addict.Dict({
                    "event": ingress,
                    "id": ingress.metadata.name,
                    "namespace": ingress.metadata.namespace,
                    "state": "new"
                })
        except Exception as e:
            LOG.error(f"Failure in scan certs every hour: {e}")
            time.sleep(1)

@adhesive.gateway('Is certificate {event.id} in valid range?')
def is_certificate_event_id_in_valid_range_(context: Token[Data]):
    namespace = context.data.event.namespace
    name = context.data.event.id

    kubeapi = KubeApi(context.workspace)

    # ... a bunch of checks

    if cert.not_valid_before + delta < now:
        LOG.info(f"Certificate for {namespace}/{name} not valid. Delta expired. "
                 f"Now is {now}, certificate is between {cert.not_valid_before} and "
                 f"{cert.not_valid_after}.")
        context.data.valid_certificate = False
        return

    LOG.info(f"Certificate for {namespace}/{name} is valid. Now is {now}, "
             f"certificate is between {cert.not_valid_before} and "
             f"{cert.not_valid_after}.")
    context.data.valid_certificate = True

Whew, that was quite a lot, but now we get to the fun part, the processing. In the processing itself, things become more comfortable to follow:

Certificate Processing

If we need to delete it, there’s nothing more to do. The Secret created is bounded to the Ingress object, so the deletion of the Ingress automatically triggered the removal of the Secret.

We then recheck if we have a valid certificate. The reason is that starting the operator will initially receive all the ingress objects in the system as new objects. Of course, we don’t want to generate new certificates just because we started the operator. Note that we’re reusing the same complex gateway definition.

Finally, if we don’t have a valid certificate, we trigger the actual certificate generation, that’s a simple Job. (I’ll explain what the job does in the next blog post.)

Now everything is fantastic:

  1. The operator registers each Ingress that appears with Let’s Encrypt and manages its TLS certificate.

  2. Deleting the Ingress deletes its certificate automatically.

To deploy the whole thing I first containerized the program ( compiled the Dockerfile, as already mentioned in a previous article):

Then I ran created a simple deployment:

kubectl apply -f https://raw.githubusercontent.com/bmustiata/letsencrypt-operator/master/microk8s/install.yml

The actual YAML code is:

apiVersion: v1
kind: Namespace
metadata:
  name: letsencrypt-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: letsencrypt-operator
  namespace: letsencrypt-operator
  labels:
    app: letsencrypt-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: letsencrypt-operator
  template:
    metadata:
      labels:
        app: letsencrypt-operator
    spec:
      containers:
      - name: letsencrypt-operator
        image: germaniumhq/certbot