Claude Code Cost Optimization: 7 Strategies That Cut Our Bill 73%

TL;DR: Heavy Claude Code API usage can run $300-500/month per developer. With model routing, prompt caching, context pruning, and sub-agent discipline, one team cut their bill from $480 to $128/month — a 73% reduction with zero quality loss. Here's the exact playbook.

The Cost of Not Thinking About Cost

Claude Code is expensive on the API. A heavy user running 40-60 sessions per week can easily hit $300-500/month. The cost isn't the problem — the waste is. Most of those tokens go to stale context, oversized CLAUDE.md files, and using Opus for tasks Haiku could handle.

Before you start optimizing, know the numbers. The current Claude model pricing per million tokens:

Model	Input (per M tokens)	Output (per M tokens)	Best For
Haiku 4.5	$1.00	$5.00	Classification, triage, simple edits
Sonnet 4.6	$3.00	$15.00	General coding, most production
Opus 4.7	$5.00	$25.00	Complex reasoning, agentic work

Output tokens cost 5x input. A long, verbose response from Opus can cost $0.50 in one turn. The strategies below target all three levers: input volume, model selection, and output verbosity.

Strategy 1: Model Routing (50-60% Savings)

The single biggest savings lever. Most developers use Sonnet or Opus for everything. A 70/20/10 split — Haiku for simple tasks, Sonnet for medium, Opus for hard — cuts costs by more than half.

Using CLAUDE.md to set model preferences by task type:

class="language-markdown"># Model routing rules
- For linting, formatting, and simple refactors: Haiku
- For feature implementation and debugging: Sonnet 
- For architecture design and complex reasoning: Opus

The real trick: assign Haiku to sub-agents performing exploration and research. They return concise summaries to the main Sonnet or Opus agent, keeping expensive reasoning tokens where they matter most.

Strategy 2: Prompt Caching (Automatic 90% Discount on Cache Hits)

Prompt caching is automatic in the Anthropic API. Stable prefixes — system prompts, CLAUDE.md, tool definitions — are cached for up to 60 minutes. A cache read costs 10% of the standard input price. A cache write costs 1.25x but pays for itself after one read.

To maximize cache hits:

Keep CLAUDE.md stable during a work session. Don't add and remove large blocks.
Don't reorder tool definitions mid-session. The cache key includes the exact prefix.
Start sessions with cached content. Open the project, make one request, then the second hit is cached.

Strategy 3: Context Pruning (30-40% Session Savings)

Every message in Claude Code re-sends the entire conversation history as input tokens. A 3-hour session with 40 turns compiles a massive context — most of it stale.

Three commands are your context management toolkit:

class="language-bash"># Check current context usage
/context

# Manually compact when at 40-50% (don't wait for auto)
/compact

# Start fresh for a new topic
/clear

The /compact command summarizes the conversation and resets the context window while preserving essential state. Run it proactively at 40-50% usage rather than waiting for auto-compaction at 100% — the manual version produces better summaries.

Strategy 4: Streamline Your CLAUDE.md (Ongoing Savings)

Your CLAUDE.md loads every session and every turn. A bloated 800-line file adds $0.02-0.05 per turn in baseline costs. Over 500 turns per month, that's $10-25 of pure waste.

Keep your global CLAUDE.md under 400 lines. Move project-specific instructions to path-scoped rules:

class="language-bash"># Create a rule that only loads when editing Python files
.claude/rules/python.md
# Create a rule for frontend work
.claude/rules/frontend.md

Path-scoped rules cost zero tokens outside their scope. Your global prompt stays lean, specialized knowledge loads on demand.

Strategy 5: Sub-Agent Discipline (Indirect Savings)

Sub-agents don't just speed up work — they reduce cost by keeping your main context lean. Instead of exploring a codebase yourself (burning expensive Sonnet/Opus tokens), dispatch a cheap Haiku sub-agent to explore and return a summary:

class="language-bash"># Use Task tool to dispatch a sub-agent for research
# It returns a summary to the main agent context
# The main agent only pays for the summary, not the exploration

This pattern reduces the main session's token count by 40-60% for exploratory tasks while maintaining quality.

Strategy 6: Batch Processing (50% Flat Discount)

Anthropic's Batch API offers a flat 50% discount on both input and output tokens for asynchronous workloads. If you have repetitive tasks — linting multiple files, generating documentation, running tests — batch them:

Queue non-urgent work overnight with batch pricing
Use Haiku for batch tasks (cheapest model + batch discount)
Collect results in the morning at 25% of real-time cost

Strategy 7: Cap Tool and Terminal Output

Verbose tool output is a hidden cost killer. A grep -r that returns 5,000 lines of log data costs real money to process as input tokens. Set explicit caps:

class="language-ini"># In your Claude Code config or CLAUDE.md
tool_output_limit: 8000
terminal_output_limit: 20000

Filter logs before sending them to Claude. grep -i error or tail -50 are your friends — Claude doesn't need the full 100MB log file to tell you what's wrong.

The Combined Effect

These seven strategies compound. One real-world team saw their bill drop from $480/month to $128/month — a 73% reduction — by applying them systematically. The key insight: optimization isn't about using Claude less. It's about using the right Claude for the right job, with the right context, at the right price.

For more on Claude Code workflows, check our coding tool comparison and AI code review pipeline build log.

Frequently Asked Questions

Does model routing affect output quality?

Not if you route correctly. Haiku handles classification, linting, and simple edits without quality loss. Sonnet handles 80% of what developers throw at it. Reserve Opus for the 20% where Sonnet demonstrably fails — architecture, complex debugging, and high-autonomy agentic tasks.

How do I track my Claude Code costs?

Use the /cost command in a session to see token count and estimated spend. For cross-session tracking, check the Anthropic Console Usage tab. Set budget alerts to avoid surprise bills.

Is prompt caching automatic?

Yes, for stable prefixes on the Anthropic API. You don't need to configure anything. The key is keeping your system prompt and CLAUDE.md stable within a session. If you edit them mid-session, the cache invalidates.

How much can I expect to save?

Teams report 50-73% reduction applying these strategies in combination. The biggest savers are model routing (50-60%) and context management (30-40%). Prompt caching is automatic but amplifies every other strategy.

Do sub-agents cost extra?

Sub-agents consume their own tokens, but they replace tokens you would have spent in the main session on the same exploration. In practice, sub-agents reduce total cost because they use cheaper models and keep the main context leaner.

← Back to all posts