Operators are one of the most powerful patterns in Kubernetes, and also one of the most poorly explained. Every description either assumes you already understand controllers and CRDs, or it explains them with an analogy so vague it tells you nothing. Here is the direct explanation.
The Problem Operators Solve
Running stateless applications on Kubernetes is straightforward. Deploy a pod, expose it as a service, done. Running stateful applications is different. Databases, message queues, and caches require operational knowledge that goes beyond “run this container.”
When you deploy a PostgreSQL cluster, you need to:
- Bootstrap the primary and replicas in the right order
- Configure replication between them
- Handle failover when the primary goes down
- Run backups on a schedule
- Scale replicas without downtime
- Apply configuration changes without restoring from scratch
A human operator knows how to do all of this. An operator in Kubernetes automates exactly this knowledge.
The Core Concept: Desired State vs Actual State
Kubernetes is built around a reconciliation loop. You tell it the desired state. It continuously checks the actual state. When they diverge, it takes action to close the gap.
A Deployment is a good example. You declare: “I want 3 replicas of this container.” Kubernetes checks. There are 2 running. It creates 1 more. Pod crashes. Kubernetes detects 2 again. Creates 1 more. This happens continuously.
Operators extend this same pattern to custom resources.
Custom Resource Definitions: Extending the API
A CRD lets you add your own resource types to Kubernetes. Instead of only having Pods, Deployments, and Services, you can add a PostgresCluster resource.
apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata:
name: my-database
spec:
replicas: 3
version: "15"
storage: 100Gi
backupSchedule: "0 2 * * *"
Kubernetes stores this in etcd like any other resource. But nothing happens yet - you need a controller to act on it.
The Controller: Your Automation Logic
The operator is a controller that watches your custom resource. When it sees a PostgresCluster object, it runs a reconciliation loop that makes the actual state match the desired state.
Pseudocode:
watch PostgresCluster resources
on every change (or every 30 seconds):
actual = what is actually running in the cluster
desired = what the spec says should be running
if primary not running:
create primary pod
wait for it to be ready
if replicas < desired:
create new replica
configure replication from primary
if backup scheduled and not yet run today:
create backup job
update status with current state
The operator encodes operational expertise. The human who wrote it understood PostgreSQL replication. The people who use it just write YAML.
Why This Is Better Than Scripts
You could write bash scripts that do all of this. Many people do. The problems:
- Scripts run once and exit. Operators run continuously.
- Scripts do not integrate with kubectl or the Kubernetes API.
- Scripts cannot react to events - a pod crash, a node failure.
- Scripts do not report status in a standard way.
Operators are self-healing by design. The reconciliation loop runs constantly. Drift is corrected automatically.
Real Operators Worth Knowing
| Operator | What it manages |
|---|---|
| CloudNativePG | PostgreSQL clusters |
| Strimzi | Apache Kafka |
| Prometheus Operator | Prometheus + Alertmanager |
| cert-manager | TLS certificates |
| External Secrets | Secrets from Vault, AWS SSM |
| Argo CD | GitOps application deployment |
These are production-grade operators with years of development behind them. Before building your own, check if one already exists for your use case.
When to Build Your Own
Build an operator when you have a stateful application with complex lifecycle management that nothing off-the-shelf covers. Good candidates:
- Internal platforms where you want to give teams a simple API
- Applications with specific upgrade procedures that are easy to get wrong
- Services where operational knowledge is siloed with one or two people
The Operator SDK (from Red Hat) and Kubebuilder (from the Kubernetes project itself) both provide scaffolding that handles the boilerplate. Most of the code you write is the reconciliation logic - the actual operational knowledge.
The Status Subresource
A well-designed operator communicates state back through the resource status:
status:
phase: Running
primary: my-database-0
replicas: 3
readyReplicas: 3
lastBackup: "2026-04-15T02:00:00Z"
conditions:
- type: Ready
status: "True"
This makes the operator’s view of the world inspectable with kubectl get postgrescluster. Debugging becomes straightforward.
Bottom Line
An operator is a controller that automates operational knowledge for a custom resource type. It extends Kubernetes’ reconciliation loop to handle stateful, complex applications. Use existing operators for common infrastructure components. Build your own when you have unique operational logic that would otherwise live in documentation and in people’s heads. The pattern is not complex - the implementation effort scales with the complexity of what you are automating.
Comments