Most developers treat the PostgreSQL query planner as a black box. They write a query, it either runs fast or it does not, and if it does not they add an index and hope for the best.

Senior engineers treat the planner as a collaborator. They understand what information it uses to make decisions, how to give it better information, and how to override it when it gets things wrong.

How the Planner Actually Makes Decisions

The planner is a cost-based optimizer. For every query, it generates multiple possible execution plans - different join orders, different index strategies, different scan types - and estimates the cost of each one. It picks the lowest estimated cost.

The critical word is “estimated.” The planner does not know the actual cost until the query runs. It estimates based on table statistics, which is why keeping statistics accurate is the foundation of query performance.

The statistics are stored in pg_statistic and surfaced in pg_stats. They include:

  • Row counts per table
  • Null fractions per column
  • Most common values and their frequencies
  • Histogram buckets for value distribution

When statistics are stale or missing, the planner makes bad estimates. Bad estimates lead to bad plans. Bad plans lead to slow queries that can crush your database.

Reading EXPLAIN ANALYZE Like an Expert

EXPLAIN ANALYZE is the first tool. But reading it well requires knowing what to look for.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

The BUFFERS option adds I/O information. FORMAT TEXT is the default but FORMAT JSON is easier to parse programmatically.

What to look for immediately:

Rows estimate vs actual. If the planner estimates 10 rows and gets 10,000, it made a bad decision based on bad information. This is where to start.

Seq Scan vs Index Scan. A sequential scan on a large table is not always wrong - sometimes it is faster - but it should be deliberate.

Hash Join vs Nested Loop vs Merge Join. Nested loop is efficient when the outer relation is small. Hash join is better for large unsorted relations. The planner should be choosing correctly, but bad row estimates can cause wrong join strategy choices.

Actual time. The time shown is cumulative per loop iteration. If loops = 1000 and time = 0.050ms, the node executed in 50ms total.

The Statistics Problems and How to Fix Them

Problem 1: Outdated statistics.

ANALYZE runs automatically via autovacuum, but it can lag on tables with heavy write loads. Run it manually when you need fresh numbers:

ANALYZE orders;
ANALYZE VERBOSE orders;  -- shows what changed

Problem 2: Default statistics target is too low.

The default statistics_target is 100, meaning 100 histogram buckets per column. For columns with high cardinality or skewed distributions, this is often not enough for accurate estimates.

ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;
ANALYZE orders;

Increasing this makes ANALYZE slower but dramatically improves estimates on skewed data. For important columns, a target of 300-500 is often justified.

Problem 3: Multi-column correlation.

The planner assumes columns are statistically independent by default. If your query filters on status = 'active' AND region = 'us-east' and those columns are correlated (most active users are in us-east), the planner will underestimate the selectivity.

The fix is extended statistics:

CREATE STATISTICS orders_status_region ON status, region FROM orders;
ANALYZE orders;

This tells the planner about the correlation and significantly improves estimates for multi-column filters.

Index Strategies Beyond the Basics

Partial indexes. If 90% of your queries filter on status = 'active' and active rows are 5% of the table, a partial index on active rows is much smaller and much faster:

CREATE INDEX idx_orders_active ON orders (created_at) WHERE status = 'active';

Index-only scans. If all the columns you need are in the index, PostgreSQL can return results without touching the heap (the main table). Design indexes to enable this:

-- Query: SELECT id, customer_id FROM orders WHERE created_at > '2025-01-01'
-- Index that enables index-only scan:
CREATE INDEX idx_orders_covering ON orders (created_at, id, customer_id);

Functional indexes. When you filter on a transformed value, index the function:

CREATE INDEX idx_users_lower_email ON users (lower(email));
-- Now this uses the index:
SELECT * FROM users WHERE lower(email) = '[email protected]';

Forcing Plan Choices for Debugging

Sometimes you want to test whether a specific plan would be faster, or you know the planner is wrong and want to override it temporarily.

-- Disable sequential scans for this session
SET enable_seqscan = off;

-- Disable nested loops
SET enable_nestloop = off;

-- Disable hash joins
SET enable_hashjoin = off;

These are session-level settings. Use them to test alternative plans. If disabling seq scans makes a query 10x faster, you know an index would help but the planner is not choosing it.

Never leave these set in production. They are diagnostic tools.

The pg_hint_plan Extension

When you need to permanently override the planner’s decision, pg_hint_plan lets you add hints to queries without changing configuration:

/*+ IndexScan(orders idx_orders_customer) */
SELECT * FROM orders WHERE customer_id = 42;

This is a last resort. Hints make queries fragile - if the index name changes, the query breaks. Prefer fixing the statistics problem that caused the bad plan in the first place.

Practical Tuning Workflow

Step Action Tool
1 Find slow queries pg_stat_statements
2 Get execution plan EXPLAIN (ANALYZE, BUFFERS)
3 Check row estimates vs actuals Read the plan output
4 Update statistics if stale ANALYZE table
5 Add extended statistics if correlated CREATE STATISTICS
6 Add or modify indexes CREATE INDEX
7 Verify improvement Re-run EXPLAIN ANALYZE

Bottom Line

The PostgreSQL query planner is sophisticated and correct most of the time - one of the many reasons Postgres is winning the database wars. When it is not, the problem is almost always stale or insufficient statistics rather than a fundamental planner limitation. Keep statistics fresh, increase statistics targets on important columns, use extended statistics for correlated columns, and design your indexes to match your actual query patterns. EXPLAIN ANALYZE with BUFFERS gives you everything you need to diagnose what is happening - and if the planner still cannot keep up, it might be time to think about whether you need to shard.