A load balancer with default settings works until it does not. Most teams set up a load balancer, see traffic distribute, and move on. The misconfigurations hide until you have a deployment, a spike, or a backend failure - and then they cause cascading problems that are hard to diagnose under pressure.

Here are the settings that cause the most production incidents.

Health Check Configuration

The default health check in most load balancers is too lenient. It checks a path (usually / or /health), waits too long for a response, and marks a backend unhealthy only after multiple consecutive failures.

The problem: a backend that is alive but processing requests slowly will pass a lenient health check while being unable to serve real traffic efficiently. Users see slow responses. The backend never gets removed.

What a good health check looks like:

# nginx upstream health check
upstream backend {
    server backend1:8080;
    server backend2:8080;

    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "GET /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx;
}
Setting Default (too lenient) Better
Check interval 30 seconds 5-10 seconds
Timeout 5 seconds 1-2 seconds
Failures before unhealthy 5 2-3
Successes before healthy 1 2
Path / Dedicated /health endpoint

The health endpoint should do real work: check that the database connection is alive, that critical dependencies are reachable, that the service can actually serve requests. A /health endpoint that just returns 200 immediately is barely better than checking the TCP port.

Connection Draining During Deployments

Without connection draining, deploying a new version of your backend drops in-flight requests. A user mid-checkout hits a connection reset. This is completely avoidable.

Connection draining works like this: when a backend is marked for removal (during a deployment), the load balancer stops sending new requests to it but allows existing connections to complete. After a timeout, the backend is removed even if connections are still open.

AWS ALB:

Deregistration delay: 30 seconds (default is 300 - too long for most apps)

nginx:

upstream backend {
    server backend1:8080;
    server backend2:8080 weight=1 slow_start=30s; # gradual traffic increase on new backends
}

The deployment sequence that works:

  1. Remove one backend from rotation (stop sending new requests)
  2. Wait for in-flight requests to complete (connection draining timeout)
  3. Deploy the new version to that backend
  4. Add it back to rotation
  5. Verify health checks pass
  6. Repeat for next backend

This is blue/green or rolling deployment at the load balancer level. Without connection draining configured, step 2 does not exist and requests are dropped.

Upstream Timeout Configuration

Timeouts that are too long cause connection exhaustion under load. If your backend is slow, the load balancer queues requests waiting for responses. The queue grows. New requests wait. Users see increasing latency. Eventually the load balancer runs out of connections.

Timeouts that are too short cause false errors when legitimate slow requests fail.

The right timeout depends on your application’s expected response time. If 99% of requests complete in under 200ms, a 5-second timeout is reasonable. A 30-second timeout means a slow backend can hold connections for 30 seconds before the load balancer marks it slow.

nginx timeout configuration:

proxy_connect_timeout 5s;      # time to establish connection to backend
proxy_send_timeout 10s;        # time for proxy to send request to backend
proxy_read_timeout 30s;        # time for backend to respond
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_next_upstream_tries 2;   # try next backend if first fails

proxy_next_upstream is crucial: if one backend fails or times out, nginx automatically tries the next one. Without this, a single failing backend causes errors for every request it receives rather than automatically failing over.

SSL Termination Gotchas

TLS 1.0 and 1.1 should be disabled. These are deprecated and vulnerable. Most load balancers have older defaults.

nginx SSL configuration:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off; # TLS 1.3 ignores this anyway
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;
add_header Strict-Transport-Security "max-age=63072000" always;

HTTPS to backends. Terminating SSL at the load balancer and sending unencrypted traffic to backends is common and often acceptable in private VPCs. If your compliance requirements demand end-to-end encryption, configure TLS between the load balancer and backends - and make sure your health checks also use HTTPS.

Sticky Sessions: When They Help and When They Hurt

Sticky sessions route a user to the same backend for the duration of their session. Useful for applications that store session state locally (not in a shared cache). Dangerous for load distribution.

If backend1 has 100 sticky sessions and backend2 has 5, backend1 is doing 20x the work. Adding capacity does not help because new users go to the new backends but existing sessions stay on the old ones.

The correct fix is not sticky sessions. It is externalizing session state to a shared store (Redis) so that any backend can serve any user. Then sticky sessions are unnecessary and you get better load distribution. This is especially important if your backends use WebSocket connections, where sticky sessions create even more imbalanced load.

If you must use sticky sessions, monitor per-backend load and set a session timeout that rebalances over time.

Rate Limiting at the Load Balancer Level

Many teams implement rate limiting in application code. Application-level rate limiting is better than nothing but misses a significant attack surface: requests that consume connections and CPU before reaching rate limiting middleware.

Load balancer rate limiting operates at the connection level, before application code runs:

# Rate limit by IP - 10 requests/second, burst of 20
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=api burst=20 nodelay;
        limit_req_status 429;
        proxy_pass http://backend;
    }
}

This drops excess requests before they touch your backend. For protecting against simple denial-of-service or scraping attacks, load balancer rate limiting is more effective than application-level rate limiting.

The Configuration Audit Checklist

Setting Check
Health check timeout Under 2 seconds
Health check interval Under 10 seconds
Health check path Dedicated endpoint that does real checks
Connection draining Enabled with appropriate timeout
Proxy timeouts Set based on actual 99th percentile response times
next_upstream Configured to try next backend on failure
TLS version 1.2+ only
Rate limiting Configured at load balancer level
Sticky sessions Removed unless truly necessary

Bottom Line

Load balancer configurations set up once and never revisited cause production incidents that seem like backend failures but are actually configuration failures. Health checks that are too lenient keep bad backends in rotation. Missing connection draining drops in-flight requests during deployments. Overly long timeouts cause connection exhaustion under load. If your backends run on Kubernetes, orphaned LoadBalancer services add cost on top of misconfiguration risk. A one-hour audit of your load balancer configuration against this checklist will find at least one setting that will cause an incident if left unaddressed.