Operators are one of the most powerful patterns in Kubernetes, and also one of the most poorly explained. Every description either assumes you already understand controllers and CRDs, or it explains them with an analogy so vague it tells you nothing. Here is the direct explanation.

The Problem Operators Solve

Running stateless applications on Kubernetes is straightforward. Deploy a pod, expose it as a service, done. Running stateful applications is different. Databases, message queues, and caches require operational knowledge that goes beyond “run this container.”

When you deploy a PostgreSQL cluster, you need to:

  • Bootstrap the primary and replicas in the right order
  • Configure replication between them
  • Handle failover when the primary goes down
  • Run backups on a schedule
  • Scale replicas without downtime
  • Apply configuration changes without restoring from scratch

A human operator knows how to do all of this. An operator in Kubernetes automates exactly this knowledge.

The Core Concept: Desired State vs Actual State

Kubernetes is built around a reconciliation loop. You tell it the desired state. It continuously checks the actual state. When they diverge, it takes action to close the gap.

A Deployment is a good example. You declare: “I want 3 replicas of this container.” Kubernetes checks. There are 2 running. It creates 1 more. Pod crashes. Kubernetes detects 2 again. Creates 1 more. This happens continuously.

Operators extend this same pattern to custom resources.

Custom Resource Definitions: Extending the API

A CRD lets you add your own resource types to Kubernetes. Instead of only having Pods, Deployments, and Services, you can add a PostgresCluster resource.

apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata:
  name: my-database
spec:
  replicas: 3
  version: "15"
  storage: 100Gi
  backupSchedule: "0 2 * * *"

Kubernetes stores this in etcd like any other resource. But nothing happens yet - you need a controller to act on it.

The Controller: Your Automation Logic

The operator is a controller that watches your custom resource. When it sees a PostgresCluster object, it runs a reconciliation loop that makes the actual state match the desired state.

Pseudocode:

watch PostgresCluster resources

on every change (or every 30 seconds):
  actual = what is actually running in the cluster
  desired = what the spec says should be running

  if primary not running:
    create primary pod
    wait for it to be ready

  if replicas < desired:
    create new replica
    configure replication from primary

  if backup scheduled and not yet run today:
    create backup job

  update status with current state

The operator encodes operational expertise. The human who wrote it understood PostgreSQL replication. The people who use it just write YAML.

Why This Is Better Than Scripts

You could write bash scripts that do all of this. Many people do. The problems:

  • Scripts run once and exit. Operators run continuously.
  • Scripts do not integrate with kubectl or the Kubernetes API.
  • Scripts cannot react to events - a pod crash, a node failure.
  • Scripts do not report status in a standard way.

Operators are self-healing by design. The reconciliation loop runs constantly. Drift is corrected automatically.

Real Operators Worth Knowing

Operator What it manages
CloudNativePG PostgreSQL clusters
Strimzi Apache Kafka
Prometheus Operator Prometheus + Alertmanager
cert-manager TLS certificates
External Secrets Secrets from Vault, AWS SSM
Argo CD GitOps application deployment

These are production-grade operators with years of development behind them. Before building your own, check if one already exists for your use case.

When to Build Your Own

Build an operator when you have a stateful application with complex lifecycle management that nothing off-the-shelf covers. Good candidates:

  • Internal platforms where you want to give teams a simple API
  • Applications with specific upgrade procedures that are easy to get wrong
  • Services where operational knowledge is siloed with one or two people

The Operator SDK (from Red Hat) and Kubebuilder (from the Kubernetes project itself) both provide scaffolding that handles the boilerplate. Most of the code you write is the reconciliation logic - the actual operational knowledge.

The Status Subresource

A well-designed operator communicates state back through the resource status:

status:
  phase: Running
  primary: my-database-0
  replicas: 3
  readyReplicas: 3
  lastBackup: "2026-04-15T02:00:00Z"
  conditions:
    - type: Ready
      status: "True"

This makes the operator’s view of the world inspectable with kubectl get postgrescluster. Debugging becomes straightforward.

Bottom Line

An operator is a controller that automates operational knowledge for a custom resource type. It extends Kubernetes’ reconciliation loop to handle stateful, complex applications. Use existing operators for common infrastructure components. Build your own when you have unique operational logic that would otherwise live in documentation and in people’s heads. The pattern is not complex - the implementation effort scales with the complexity of what you are automating.