Most Kubernetes tutorials cover the same ground: create a Deployment, expose it with a Service, maybe add an Ingress. This gets you running. It doesn’t get you to production without incidents.
The features below are the ones that prevent the incidents. They’re in the documentation but underrepresented in tutorials, and most teams discover them the hard way - after something breaks.
Pod Disruption Budgets
Without a Pod Disruption Budget (PDB), a node drain or rolling upgrade can take down all replicas of a service simultaneously. This is how maintenance windows become outages.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # or use maxUnavailable: 1
selector:
matchLabels:
app: api
This tells Kubernetes: when voluntarily disrupting pods (node drain, rolling update), ensure at least 2 are available at all times. If a node drain would violate this, the drain blocks until it can proceed safely.
minAvailable and maxUnavailable accept both integers and percentages. For a 3-replica Deployment, minAvailable: 50% ensures at least 2 are always up. For a 10-replica Deployment, maxUnavailable: 20% means at most 2 can be down at once.
Every production service with more than one replica should have a PDB. This takes 2 minutes to add.
Topology Spread Constraints
You have 3 replicas. All 3 landed on the same availability zone because of how the scheduler happened to place them. The AZ has a partial outage. Your service is down.
Topology Spread Constraints prevent this:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
maxSkew: 1 means the difference in pod count between any two zones cannot exceed 1. With 3 replicas across 3 zones, you get 1-1-1. With 4 replicas, 2-1-1. The whenUnsatisfiable: DoNotSchedule makes the scheduler wait for a zone-balanced placement rather than compromising.
You can stack these - constrain by zone and by node simultaneously:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
This spreads across both zones and individual nodes, giving you true redundancy.
Vertical Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) is well-known: add more replicas when CPU is high. The Vertical Pod Autoscaler (VPA) is less used but handles a real problem: your resource requests are wrong.
Setting resource requests correctly requires knowing your application’s actual CPU and memory usage. Most teams guess conservatively (requesting too much, wasting money) or aggressively (requesting too little, getting OOMKilled).
VPA in recommendation mode analyzes actual usage and suggests right-sized requests:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Off" # Recommendation only, no auto-changes
After a few days, check the recommendations:
kubectl describe vpa api-vpa
You’ll see target CPU and memory values based on actual observed usage. Use these numbers to update your Deployment’s resource requests. This typically finds 30-50% overprovisioning.
Resource Quotas Per Namespace
Without resource quotas, a single namespace can consume all cluster resources, starving other services. This is how a runaway workload in staging affects production.
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: development
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
count/pods: "20"
count/services: "10"
This limits the development namespace to 4 CPU cores (requested) and 8 GB of memory. No amount of misconfigured workloads in development can exceed these bounds.
Pair this with LimitRange to set default resource requests/limits for pods that don’t specify them:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: development
spec:
limits:
- default:
cpu: 200m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container
Startup, Liveness, and Readiness Probes (Done Right)
Everyone knows these exist. Few use them correctly.
The common mistake: using a liveness probe that’s too aggressive (frequent checks, low failure threshold) on a slow-starting application. Result: Kubernetes kills the pod before it finishes starting, creating a restart loop.
The correct pattern:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # Allow 5 minutes to start (30 * 10s)
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 10
failureThreshold: 3
The startup probe runs until the app is ready, protecting slow-starting apps. Once the startup probe succeeds, the liveness probe takes over. The readiness probe is separate - it controls whether traffic is sent to the pod (use this for circuit breaker patterns or graceful degradation).
Priority Classes
When the cluster is resource-constrained and nodes are under pressure, which pods get evicted? Without priority classes, it’s arbitrary. With them, it’s intentional.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: production-critical
value: 1000000
globalDefault: false
description: "Production workloads that must not be evicted"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: batch-low-priority
value: 100
globalDefault: false
description: "Batch jobs that can be interrupted"
Assign production services to production-critical and batch jobs to batch-low-priority. Under memory pressure, Kubernetes evicts low-priority pods first.
Bottom Line
Pod Disruption Budgets, Topology Spread Constraints, VPA recommendations, Resource Quotas, and Priority Classes are not advanced Kubernetes topics - they are table stakes for running production workloads safely. The 2-4 hours required to implement all of these is paid back by the first incident they prevent. Most teams add them after an outage. Add them before.
Comments