Skip to content

Operator Best Practices

Development

Considerations for operator developers

  • An Operator should manage a single type of application, essentially following the UNIX principle: do one thing and do it well.

  • If an application consists of multiple different tiers or components, multiple Operators should be written for each of them. If the application for example consists of Redis, AMQ and MySQL, there should be 3 Operators, not one.

  • If there is significant orchestration and sequencing involved, an Operator should be written that represents the entire stack, in turn delegating to other Operators for orchestrating their part of it.

  • Operators should own a CRD and only one Operator should control a CRD on cluster. Two Operators managing the same CRD calls for trouble. In case where an API exists but with multiple implementations, we typically have a no-op Operator (without any deployment/reconciliation loop) that defines that shared API and other Operators depend on this Operator to provide one implementation of the API, e.g. similar to PVCs or Ingress.

  • Inside an Operator, multiple controllers should be used if multiple CRDs are managed. This helps with separation of concerns and code readability. Note that this doesn't necessarily mean one container image per controller, but rather one reconciliation loop (which could be running as part of the same operator binary) per CRD.

  • An operator shouldn't deploy or manage other operators (such patterns are known as meta or super operators). It's the Operator Lifecycle Manager's job to manage the deployment and lifecycle of operators.

  • If multiple operators must be packaged/shipped as a single entity (via the same CSV), then add all owned and required CRDs, as well as all deployments for operators that manage the owned CRDs, to the same CSV.

  • Writing an Operator involves using the Kubernetes API and there is always the same boilerplate code required to connect and interact with it. Use a framework like Operator-SDK to save yourself time with this and get a suite of tooling to ease development and testing.

  • Operators shouldn’t make any assumptions about the namespace they are deployed in or hard-code names of resources that they expect to already exist.

  • Operators shouldn’t hard code the namespaces they are watching. This should be configurable - having no namespace supplied is interpreted as watching all namespaces

  • Semantic versioning (aka semver) should be used to version an Operator. Operators are long-running workloads on the cluster and it’s APIs are potentially in need of support over a longer period of time. Use the semver.org guidelines to help determine when and how to bump versions when there are breaking or non-breaking changes.

  • Kubernetes API versioning guidelines should be used to version Operator CRDs. Use the Kubernetes sig-architecture guidelines to get best practices on when to bump versions and when breaking changes are acceptable.

  • When defining CRDs, you should use OpenAPI spec to create a structural schema for your CRDs.

  • Operators are instrumented to provide useful, actionable metrics to external systems (e.g. monitoring/alerting platforms). Minimally, metrics should represent the software's health and key performance indicators, as well as support the creation of service levels indicators such as throughput, latency, availability, errors, capacity, etc.

Summary

  • One Operator per managed application

  • Multiple operators should be used for complex, multi-tier application stacks

  • CRD can only be owned by a single Operator, shared CRDs should be owned by a separate Operator

  • One controller per custom resource definition

  • Use a framework like Operator SDK

  • Do not hard-code namespaces or resources names

  • Make watch namespace configurable

  • Use semver / observe Kubernetes guidelines on versioning APIs

  • Use OpenAPI spec with structural schema on CRDs

  • Operators expose metrics to external systems

Running On-Cluster

Considerations for on-cluster behavior

  • Like all containers on Kubernetes, Operators shouldn’t need to run as root unless absolutely necessary. Operators should come with their own ServiceAccount and not rely on the default.

  • Operators should not self-register their CRDs. These are global resources and careful consideration needs to be taken when setting those up. Also this requires the Operator to have global privileges which is potentially dangerous compared to that little extra convenience.

  • Operators use CRs as the primary interface to the cluster user. As such, at all times, meaningful status information should be written to those objects unless they are solely used to store data in a structured schema.

  • Operators should update according to semver and should be updated frequently.

  • Operators need to support updating managed applications (Operands) that were set up by an older version of the Operator. There are multiple models for this:

  • operator fan-out - where the operator allows the user to specify the version in the custom resource.

  • single version - where the operator is tied to the version of the operand.

  • hybrid approach - where the operator is tied to a range of versions, and the user can select some level of the version.

  • An Operator should not deploy another Operator - an additional component on cluster should take care of this (OLM).

  • When Operators change their APIs, CRD conversion (webhooks) should be used to deal with potentially older instances of them using the previous API version.

  • Operators should make it easy for users to use their APIs - validating and rejecting malformed requests via extensive Open API validation schema on CRDs or via an admission webhook is good practice.

  • The Operator itself should be really modest in its requirements - it should always be able to deploy by deploying its controllers, no user input should be required to start up the Operator.

  • If user input is required to change the configuration of the Operator itself, a Configuration CRD should be used. Init-containers as part of the Operator deployments can be used to create a default instance of those CRs and then the Operator manages their lifecycle.

Summary:

An Operator...

  • Does not run as root

  • Does not self-register CRDs

  • Does not install other Operators - rely on dependencies via package manager (OLM)

  • Writes meaningful status information on Custom Resources objects unless pure data structure

  • Should be capable of updating from a previous version of the Operator

  • Should be capable of managing an Operand from an older Operator version

  • Uses CRD conversion (webhooks) if API/CRDs change

  • Uses OpenAPI validation / Admission Webhooks to reject invalid CRs

  • Should always be able to deploy and come up without user input

  • Offers (pre)configuration via a “Configuration CR” instantiated by InitContainers