Claude MCP Code Execution: Cut Agent Token Usage by 98% (Anthropic Pattern)
Anthropic's code execution with MCP pattern slashes Claude agent token usage from 150K to 2K tokens. Learn how to implement it step-by-step in 2026.
Claude MCP Code Execution: Cut Agent Token Usage by 98%
If you're building Claude agents with MCP integrations, you might be burning through tokens at an alarming rate — and not even know why. A workflow that should cost pennies ends up consuming 150,000 tokens per run. Multiply that by hundreds of daily requests and your API bill grows fast.
Anthropic's engineering team published a pattern that fixes this: code execution with MCP. The result? That same 150,000-token workflow runs in roughly 2,000 tokens — a 98.7% reduction. This guide breaks down exactly how it works and how you can implement it today.
Why Standard MCP Agents Are Token-Inefficient
When most developers build Claude agents with MCP servers, they follow the default pattern: load all tool definitions into the context upfront, then let the model decide which tools to call.
Here's the problem. A moderately complex MCP setup with 20–30 tools can easily consume 10,000–40,000 tokens just in tool definitions before your agent processes a single piece of user input. Then consider what happens during execution:
- Every intermediate result from a tool call passes through the model context
- Multi-step workflows chain these results, stacking token consumption
- A 2-hour meeting transcript processed by a transcription + summarization + CRM-write workflow could push an additional 50,000 tokens just from intermediate data passing through the model
This is what Anthropic calls the direct tool-calling syntax problem: the model sees everything, which is safe but expensive.
Here's a concrete before-and-after:
Standard MCP agent (direct tool calling):
- Tool definitions loaded: ~15,000 tokens
- 3 tool calls with intermediate results: ~40,000 tokens
- Final synthesis: ~10,000 tokens
Total: ~65,000–150,000 tokens per run
Code execution with MCP:
- Tool discovery via code (lazy loading): ~200 tokens
- Filtered intermediate results: ~800 tokens
- Final synthesis: ~1,000 tokens
Total: ~2,000 tokens per runHow Code Execution with MCP Works
Instead of exposing all MCP tools as first-class context entries, the code execution pattern treats your MCP servers as code APIs. Claude writes and executes code (in a sandboxed TypeScript or Python runtime) that interacts with MCP servers programmatically.
The three core ideas:
1. Lazy Tool Loading — Load Only What You Need
Instead of injecting all tool definitions into the system prompt, you give Claude access to a code execution environment and a filesystem-based tool catalog. Claude explores the catalog in code:
typescript// Claude writes this code to discover relevant tools
import { listMcpServers } from './mcp-registry';
const servers = await listMcpServers();
// Returns: ['slack', 'notion', 'github', 'linear']
// Claude only imports the tool definitions it actually needs
const { searchIssues, createPR } = await import('./tools/github');
const { sendMessage } = await import('./tools/slack');Instead of 30 tool definitions loaded upfront (~30,000 tokens), Claude loads 3 targeted definitions (~300 tokens). The catalog stays on the filesystem, not in the model's context.
2. Intermediate Results Stay in the Execution Environment
In standard MCP, every tool output flows back through the model. With code execution, intermediate results live in memory or the filesystem inside the sandbox — Claude only reads what it explicitly returns or logs.
typescript// Standard approach — ALL of this flows through the model:
const transcript = await transcribe(meetingRecording); // 50K tokens
const summary = await summarize(transcript); // 50K + 2K tokens
await writeToCRM(summary); // 52K + 500 tokens
// Code execution approach — only final output surfaces:
async function processMeeting(recordingPath: string) {
const transcript = await transcribe(recordingPath); // stays in sandbox
const summary = await summarize(transcript); // stays in sandbox
await writeToCRM(summary);
return `CRM updated with meeting summary (${summary.wordCount} words)`;
// Model only sees this 10-token return value
}The 50,000-token transcript never enters the model context. Claude gets the result it needs — a confirmation string — without processing every word of the source data.
3. Complex Logic in a Single Agent Step
With standard MCP, multi-condition branching requires multiple model turns (check condition → model decides → call tool → model decides → call next tool). With code execution, that branching lives in the code:
typescript// Multi-step conditional logic in one agent execution
const issues = await github.searchIssues({ label: 'urgent', state: 'open' });
if (issues.length > 10) {
await slack.sendMessage('#oncall', `🚨 ${issues.length} urgent issues need triage`);
await linear.createTask({ title: 'Urgent issue triage', priority: 'high' });
} else {
await notion.updateStatus('issue-tracker', { urgentCount: issues.length });
}
return `Processed ${issues.length} urgent issues`;This replaces 4–5 model turns with a single code execution step. Each saved turn is thousands of tokens.
Step-by-Step: Implementing the Pattern
Here's how to restructure an existing Claude + MCP agent to use code execution.
Step 1: Set Up a Sandboxed Execution Environment
You need a runtime where Claude can execute code safely. Popular options in 2026:
- Cloudflare Workers — serverless, fast cold starts, good for simple tasks
- Modal — powerful for compute-heavy workflows, great Python support
- Vercel Edge Functions — ideal if you're already on the Vercel stack
- Daytona — Anthropic-recommended for self-hosted sandboxes
If you're using Claude Managed Agents on the Claude Platform, self-hosted sandboxes are now in public beta — you can point the agent's execution environment at your own infrastructure.
Step 2: Create a Filesystem Tool Registry
Instead of loading tool schemas into the system prompt, create a structured directory:
/mcp-tools/
registry.json # index of all servers + tool names
github/
searchIssues.ts # individual tool with schema + implementation
createPR.ts
slack/
sendMessage.ts
notion/
updatePage.tsEach tool file exports its own schema and callable function. The agent loads the registry index (~500 tokens), then imports specific tool files only when needed.
Step 3: Update Your System Prompt
Replace the long tool-injection block with a concise code-execution instruction:
You have access to a code execution environment and an MCP tool registry at /mcp-tools/registry.json.
To use a tool:
1. Read registry.json to find the right server and tool name
2. Import the specific tool from /mcp-tools/{server}/{tool}.ts
3. Execute it in code and return only the relevant output
Keep intermediate data in the execution environment. Only surface final results in your response.Your system prompt goes from 15,000 tokens to ~200 tokens.
Step 4: Pass the Code Execution Tool to Claude
Using the Claude API, add the code execution capability as a tool:
pythonimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
system=SYSTEM_PROMPT, # The concise one from Step 3
tools=[
{
"type": "computer_20241022", # or your custom execution tool
"name": "code_execution",
"description": "Execute TypeScript/Python code in a sandboxed environment with access to the MCP tool registry"
}
],
messages=[{"role": "user", "content": user_input}]
)With Claude Opus 4.8, dynamic workflows are on by default for Max and Team plan users, which means the model already knows how to sequence code execution steps without explicit chaining logic on your end.
Real-World Token Savings by Use Case
Based on Anthropic's benchmarks and community reports, here's what teams are seeing:
| Workflow | Before (tokens) | After (tokens) | Reduction |
|---|---|---|---|
| Meeting transcript → CRM | ~150,000 | ~2,000 | 98.7% |
| GitHub PR review + Slack notify | ~45,000 | ~1,200 | 97.3% |
| Multi-source research aggregation | ~80,000 | ~3,500 | 95.6% |
| Daily standup data collection | ~30,000 | ~900 | 97.0% |
At Claude Opus 4.8 pricing ($15/M input tokens), a workflow at 150K tokens costs ~$2.25 per run. At 2K tokens, that's ~$0.03. For a team running 500 workflows per day, that's $1,125/day vs $15/day — roughly $405,000 saved per year at scale.
When NOT to Use Code Execution with MCP
This pattern is powerful, but not universal. Stick with direct tool calling when:
- You need the model to reason about intermediate results — if Claude needs to evaluate complex output (like dense medical data) before deciding the next step, forcing it into code execution may reduce quality
- Low-tool-count scenarios — if your agent uses only 2–3 tools, the overhead of setting up code execution probably isn't worth it
- Simple request-response flows — a Q&A bot with one knowledge-base tool doesn't need this optimization
- Debugging and observability are critical — with intermediate results hidden in the sandbox, debugging agent failures requires more instrumentation
For production use, a good rule of thumb: if your agent regularly makes 5+ tool calls per run or processes large documents/data as intermediate steps, code execution will pay off significantly.
Getting Started with CCA Exam Preparation
The code execution with MCP pattern is core knowledge for the Claude Certified Architect (CCA-F) exam. Anthropic's certification tests your understanding of:
- Token efficiency patterns in agentic systems
- MCP server architecture and integration strategies
- Production deployment considerations for Claude agents
- Tool use vs. code execution trade-offs
If you're preparing for the CCA exam, our practice test bank includes 50+ questions on agentic architecture patterns including this exact MCP optimization topic — the kind of nuanced decision-making the exam rewards.
Key Takeaways
- Standard MCP agents load all tool definitions into context, which can cost 15,000–40,000 tokens before any work starts
- Code execution with MCP shifts tool loading and intermediate data processing into a sandboxed runtime — the model only sees final results
- Anthropic's own benchmark: a 150,000-token workflow reduced to 2,000 tokens (98.7% cut)
- Three techniques: lazy tool loading via a filesystem registry, keeping intermediates in the sandbox, and collapsing multi-turn branching into single code steps
- Best suited for agents with 5+ tools or large intermediate data; not necessary for simple flows
- This pattern aligns with Claude Opus 4.8's dynamic workflows feature, which orchestrates code execution steps automatically
Next Steps
The MCP ecosystem is moving fast. Teams that learn to build token-efficient agents now will have a significant cost and performance advantage as agentic workloads scale.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.