The Kubernetes Features Nobody Talks About But Everyone Needs

Most Kubernetes tutorials cover the same ground: create a Deployment, expose it with a Service, maybe add an Ingress. This gets you running. It doesn’t get you to production without incidents.

The features below are the ones that prevent the incidents. They’re in the documentation but underrepresented in tutorials, and most teams discover them the hard way - after something breaks.

Pod Disruption Budgets

Without a Pod Disruption Budget (PDB), a node drain or rolling upgrade can take down all replicas of a service simultaneously. This is how maintenance windows become outages.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: api

This tells Kubernetes: when voluntarily disrupting pods (node drain, rolling update), ensure at least 2 are available at all times. If a node drain would violate this, the drain blocks until it can proceed safely.

minAvailable and maxUnavailable accept both integers and percentages. For a 3-replica Deployment, minAvailable: 50% ensures at least 2 are always up. For a 10-replica Deployment, maxUnavailable: 20% means at most 2 can be down at once.

Every production service with more than one replica should have a PDB. This takes 2 minutes to add.

Topology Spread Constraints

You have 3 replicas. All 3 landed on the same availability zone because of how the scheduler happened to place them. The AZ has a partial outage. Your service is down.

Topology Spread Constraints prevent this:

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: api

maxSkew: 1 means the difference in pod count between any two zones cannot exceed 1. With 3 replicas across 3 zones, you get 1-1-1. With 4 replicas, 2-1-1. The whenUnsatisfiable: DoNotSchedule makes the scheduler wait for a zone-balanced placement rather than compromising.

You can stack these - constrain by zone and by node simultaneously:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule

This spreads across both zones and individual nodes, giving you true redundancy.

Vertical Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) is well-known: add more replicas when CPU is high. The Vertical Pod Autoscaler (VPA) is less used but handles a real problem: your resource requests are wrong.

Setting resource requests correctly requires knowing your application’s actual CPU and memory usage. Most teams guess conservatively (requesting too much, wasting money) or aggressively (requesting too little, getting OOMKilled).

VPA in recommendation mode analyzes actual usage and suggests right-sized requests:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"  # Recommendation only, no auto-changes

After a few days, check the recommendations:

kubectl describe vpa api-vpa

You’ll see target CPU and memory values based on actual observed usage. Use these numbers to update your Deployment’s resource requests. This typically finds 30-50% overprovisioning.

Resource Quotas Per Namespace

Without resource quotas, a single namespace can consume all cluster resources, starving other services. This is how a runaway workload in staging affects production.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: development
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    count/pods: "20"
    count/services: "10"

This limits the development namespace to 4 CPU cores (requested) and 8 GB of memory. No amount of misconfigured workloads in development can exceed these bounds.

Pair this with LimitRange to set default resource requests/limits for pods that don’t specify them:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
    - default:
        cpu: 200m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      type: Container

Startup, Liveness, and Readiness Probes (Done Right)

Everyone knows these exist. Few use them correctly.

The common mistake: using a liveness probe that’s too aggressive (frequent checks, low failure threshold) on a slow-starting application. Result: Kubernetes kills the pod before it finishes starting, creating a restart loop.

The correct pattern:

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30    # Allow 5 minutes to start (30 * 10s)
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 30
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

The startup probe runs until the app is ready, protecting slow-starting apps. Once the startup probe succeeds, the liveness probe takes over. The readiness probe is separate - it controls whether traffic is sent to the pod (use this for circuit breaker patterns or graceful degradation).

Priority Classes

When the cluster is resource-constrained and nodes are under pressure, which pods get evicted? Without priority classes, it’s arbitrary. With them, it’s intentional.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-critical
value: 1000000
globalDefault: false
description: "Production workloads that must not be evicted"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-low-priority
value: 100
globalDefault: false
description: "Batch jobs that can be interrupted"

Assign production services to production-critical and batch jobs to batch-low-priority. Under memory pressure, Kubernetes evicts low-priority pods first.

Bottom Line

Pod Disruption Budgets, Topology Spread Constraints, VPA recommendations, Resource Quotas, and Priority Classes are not advanced Kubernetes topics - they are table stakes for running production workloads safely. The 2-4 hours required to implement all of these is paid back by the first incident they prevent. Most teams add them after an outage. Add them before.

Pod Disruption Budgets#

Topology Spread Constraints#

Vertical Pod Autoscaler#

Resource Quotas Per Namespace#

Startup, Liveness, and Readiness Probes (Done Right)#

Priority Classes#

Bottom Line#

Comments