How to Reduce Claude Code Costs by 70 Percent with Context Management

Claude Code bills by tokens. Every file read, every tool schema, every message in the conversation - it all goes into the context window, and the context window determines cost. Most developers treat context as unlimited and invisible. It is neither. Understanding how context accumulates and managing it deliberately can cut Claude Code costs by 70% or more without reducing output quality.

How Context Accumulates

A fresh Claude Code session starts with a base context: system prompt, CLAUDE.md content, MCP tool schemas, and the initial user message. A typical starting context is 3,000-8,000 tokens depending on configuration.

Every interaction adds to the context:

User messages: the prompt text
Tool calls: the request and response for every file read, bash command, search, and edit
Claude’s responses: the reasoning and output text
File contents: every file Claude reads goes into context in full

A 500-line file is roughly 2,000-3,000 tokens. Read five files and the context has grown by 10,000-15,000 tokens. Run a bash command that outputs 200 lines of test results - another 1,000+ tokens. Over a 30-minute session, context easily reaches 50,000-100,000 tokens.

The cost is not just the final context size. Every message in the conversation includes all previous context as input. Message 1 sends 5,000 tokens. Message 2 sends 12,000 tokens. Message 10 sends 80,000 tokens. The cumulative cost grows quadratically with conversation length.

The Context Thresholds

Claude Code has internal thresholds that trigger behavior changes as the context window fills:

Threshold	What Happens
~70% full	Claude Code warns about context size
~85% full	Auto-compaction may trigger, summarizing older messages
~90%+ full	Aggressive compaction, potential loss of earlier context details

Auto-compaction summarizes the conversation history to free space. This is lossy - details from earlier in the conversation get compressed into summaries. Instructions from CLAUDE.md that were loaded at session start can lose priority. Tasks that depend on precise details from 20 messages ago may produce incorrect results.

The goal is to never hit these thresholds. Working within 50-60% of the context window produces the best results at the lowest cost.

Strategy 1: Use @file Instead of Pasting

When providing context, the instinct is to paste file contents directly into the prompt. This works but is wasteful when Claude needs to reference the file multiple times.

Expensive approach:

Here is the auth module:
[paste 300 lines of code]

Fix the token refresh bug.

Better approach:

Look at src/lib/auth/token.ts and fix the token refresh bug.

When Claude reads a file via its built-in tools, it reads exactly what it needs. Pasting the entire file forces all 300 lines into context regardless of relevance. Claude’s file reading is also smarter than it looks - it can read specific line ranges when it knows where to look.

The @file syntax in Claude Code is even more efficient for frequently referenced files:

@src/lib/auth/token.ts Fix the token refresh bug.

Strategy 2: Use /compact Proactively

The /compact command triggers manual compaction - summarizing the conversation history to free context space. Use it proactively, not reactively.

When to compact:

After completing a distinct subtask (finished the auth fix, moving to the API endpoint)
After a long debugging session that produced lots of tool output
Before starting a task that will require reading many files
When the conversation has exceeded 20 back-and-forth messages

When NOT to compact:

In the middle of a multi-step task where earlier context is critical
Right after providing detailed instructions that Claude needs to follow precisely

A good rhythm is to compact after every completed task within a session. This keeps context lean for the next task.

Strategy 3: Use /clear for Unrelated Tasks

/clear wipes the entire conversation and starts fresh. This is more aggressive than /compact but is the right choice when switching to a completely unrelated task.

# Task 1: Fix the auth bug
[20 messages of debugging]

/clear

# Task 2: Add pagination to the users endpoint

Without /clear, Task 2 carries the full context of Task 1 - all the auth module file reads, test outputs, and debugging steps. None of that is relevant. Starting fresh means Task 2 runs at minimum context cost.

The rule: if the next task shares zero context with the current conversation, use /clear. If it shares some context (same area of the codebase, related feature), use /compact.

Strategy 4: Disable Unused MCP Servers

Every MCP server adds its tool schemas to the context. A server with 20 tools might add 500-800 tokens to every single message. If the project has four MCP servers configured but the current task only needs one, the other three are pure waste.

Project-level optimization: Configure MCP servers in .claude/settings.json per project instead of globally. A frontend project should not load the PostgreSQL MCP server.

Session-level optimization: If a specific session does not need an MCP server, disable it before starting work.

The token savings are meaningful. Removing three unused MCP servers might save 1,500-2,000 tokens per message. Over a 40-message session, that is 60,000-80,000 tokens saved - real money.

Strategy 5: Sub-Agents for Large Tasks

Claude Code can spawn sub-agents for isolated tasks. A sub-agent gets its own context window, executes a specific task, and returns a summary to the parent conversation.

This is powerful for tasks that require reading many files but where the parent conversation only needs the conclusion:

Use a sub-agent to analyze all files in src/server/routes/ and list
every endpoint that does not have input validation.

The sub-agent reads 20 route files (potentially 40,000+ tokens of file content), analyzes them, and returns a concise summary (maybe 500 tokens). The parent conversation never sees the 40,000 tokens of raw file content.

Without sub-agents, the same task would dump all those file contents into the main context, inflating the cost of every subsequent message.

Good sub-agent tasks:

Codebase analysis and summarization
Finding all instances of a pattern across many files
Generating boilerplate from templates
Running and analyzing large test suites

Bad sub-agent tasks:

Tasks that require back-and-forth with the user
Tasks that need context from the current conversation
Quick operations that read one or two files

Strategy 6: Path-Specific Rules in .claude/rules/

This was covered in the CLAUDE.md guide but the cost angle is worth emphasizing. Global CLAUDE.md instructions load on every interaction. Path-specific rules in .claude/rules/ only load when Claude works on matching files.

Moving 20 lines of test-specific instructions from CLAUDE.md to a rules file with globs: ["**/*.test.ts"] means those 20 lines (~100 tokens) do not load when Claude is writing production code. Over a session, that compounds.

Strategy 7: Write Focused Prompts

Vague prompts generate expensive conversations. “Make the app better” leads to Claude reading dozens of files trying to understand what “better” means. “Add retry logic to the API client in src/lib/api.ts with exponential backoff, max 3 retries” leads to Claude reading one file and making a targeted change.

Expensive prompt:

The users page is slow. Fix it.

Cheap prompt:

The users list at src/app/users/page.tsx re-renders on every keystroke
in the search input. Add debouncing with a 300ms delay using the
useDebounce hook from src/hooks/useDebounce.ts.

The second prompt costs less because Claude reads fewer files, makes fewer exploratory tool calls, and finishes in fewer messages.

Measuring Context Usage

Claude Code shows context usage in the interface. Watch it during sessions and develop intuition for what operations are expensive:

Operation	Typical Token Cost
Read a 200-line file	800-1,200 tokens
Read a 500-line file	2,000-3,000 tokens
Bash command with moderate output	300-800 tokens
Bash command with large output	2,000-5,000 tokens
MCP tool schema (per server)	200-1,200 tokens
CLAUDE.md (30 lines)	150-250 tokens

The 70% Reduction

Applying all these strategies together produces dramatic savings:

Disable 3 unused MCP servers: -1,500 tokens/message
Use /compact after each subtask: -30% cumulative tokens
Use /clear between unrelated tasks: -50% or more per task
Focused prompts instead of vague ones: -40% tool call tokens
Sub-agents for analysis tasks: -80% for those specific tasks
Path-specific rules instead of global: -5-10% per message

The 70% figure is conservative for developers who previously used Claude Code with default settings and no context management. The actual savings depend on usage patterns, but the directional improvement is consistent. Context management is not optional - it is the primary lever for controlling Claude Code costs.

How Context Accumulates#

The Context Thresholds#

Strategy 1: Use @file Instead of Pasting#

Strategy 2: Use /compact Proactively#

Strategy 3: Use /clear for Unrelated Tasks#

Strategy 4: Disable Unused MCP Servers#

Strategy 5: Sub-Agents for Large Tasks#

Strategy 6: Path-Specific Rules in .claude/rules/#

Strategy 7: Write Focused Prompts#

Measuring Context Usage#

The 70% Reduction#

Comments