The symptom was gradual: database CPU creeping from 40% to 60% to 80% over three weeks without a corresponding spike in traffic. We were not under unusual load. The query patterns were not obviously wrong. It took pg_stat_statements and 20 minutes of analysis to find a single query pattern that was running thousands of times per minute - and could be reduced to tens of times per minute.
The Situation
The application is a content platform. Users visit pages that display articles with author information, tag lists, view counts, and related article recommendations. Load had grown steadily and we were 30 days from needing to scale the database vertically, which meant a multi-hour maintenance window and a doubled database bill.
The symptom in numbers: RDS PostgreSQL instance at 80% average CPU, 4,000 queries per second, p99 query latency at 450ms.
Finding the Problem With pg_stat_statements
SELECT
query,
calls,
total_exec_time / calls AS avg_ms,
rows / calls AS avg_rows,
total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
The top entry by total_exec_time was not a complex query. It was something like:
SELECT * FROM users WHERE id = $1
It had been called 8.7 million times in the stats window. Average execution time was 0.3ms - which looked fine in isolation. But 8.7 million calls added up to 45 minutes of database CPU time in a 6-hour window.
The Root Cause: N+1 in the ORM
The article list endpoint was loading 20 articles per page. For each article, the ORM was making a separate query to fetch the author. Classic N+1 problem.
# What the code looked like (simplified)
articles = Article.objects.filter(published=True).order_by('-created_at')[:20]
for article in articles:
author = article.author # N separate queries
print(f"{article.title} by {author.name}")
One request: 1 query for articles + 20 queries for authors = 21 database queries.
At 200 requests per second to that endpoint: 4,000 user-facing queries per second, most of them single-row user lookups. This explained the query count exactly.
The Fix: select_related
# After: one query with a JOIN
articles = (Article.objects
.filter(published=True)
.select_related('author', 'author__profile')
.prefetch_related('tags')
.order_by('-created_at')[:20])
select_related turns the N author lookups into a single JOIN. prefetch_related for tags does two queries total: one for the articles, one for all tags related to those articles, joined in Python. Instead of 21 queries per request: 3 queries.
The SQL generated:
-- Before (per article):
SELECT * FROM users WHERE id = 42;
-- After (one query for all articles):
SELECT articles.*, users.*, profiles.*
FROM articles
JOIN users ON articles.author_id = users.id
LEFT JOIN profiles ON profiles.user_id = users.id
WHERE articles.published = true
ORDER BY articles.created_at DESC
LIMIT 20;
The Results After Deployment
| Metric | Before | After |
|---|---|---|
| Queries per second | ~4,000 | ~600 |
| Database CPU | 80% | 32% |
| p99 latency | 450ms | 95ms |
| DB query latency (p99) | 45ms | 12ms |
The change took 10 lines of code. Deployment took 5 minutes. No infrastructure changes. No indexes needed. No schema migration. The database went from nearly saturated to running at one-third capacity.
Why This Happens
ORM abstractions make the N+1 problem easy to create:
- Accessing a related object through an ORM field triggers a lazy query
- In templates, loops over collections with related data multiply the queries
- The problem is invisible in development with small datasets - queries are fast
- In production with thousands of requests, the multiplication becomes catastrophic
Most ORMs have solutions: Django has select_related/prefetch_related, SQLAlchemy has joinedload/selectinload, ActiveRecord has includes. The solution is the same across languages.
Finding Your N+1 Problems
Enable query logging in development:
# Django settings for development
LOGGING = {
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}
Watch the console during a page load. If you see the same query shape repeated 20 times, you have N+1.
Use Django Debug Toolbar / Bullet gem (Rails) / similar:
These development tools count queries per request and highlight duplicates. They make N+1 obvious before it reaches production.
In production: pg_stat_statements
Sort by total execution time, not average time. A fast query that runs a million times is more expensive than a slow query that runs once.
Beyond N+1: The Other Query Patterns to Fix
While we were in the query, we also added:
-- Index on the filtered + sorted column
CREATE INDEX articles_published_created_idx
ON articles(published, created_at DESC)
WHERE published = true;
Partial index on published articles only - smaller index, faster scans. The query planner immediately started using it.
The combination of eliminating N+1 and the partial index dropped database CPU from 80% to 28%. Six months of headroom instead of 30 days.
Bottom Line
N+1 queries are one of the highest-leverage fixes in database performance work. They are invisible in development, show up as “fast queries that run many times” in pg_stat_statements, and fix with a one-line ORM change. Before scaling your database, run pg_stat_statements sorted by total execution time and look for high-call-count queries. The fix is often simpler than you expect.
Comments