Claude MCP Code Execution: Cut Agent Token Usage by 98%

If you're building Claude agents with MCP integrations, you might be burning through tokens at an alarming rate — and not even know why. A workflow that should cost pennies ends up consuming 150,000 tokens per run. Multiply that by hundreds of daily requests and your API bill grows fast.

Anthropic's engineering team published a pattern that fixes this: code execution with MCP. The result? That same 150,000-token workflow runs in roughly 2,000 tokens — a 98.7% reduction. This guide breaks down exactly how it works and how you can implement it today.

Why Standard MCP Agents Are Token-Inefficient

When most developers build Claude agents with MCP servers, they follow the default pattern: load all tool definitions into the context upfront, then let the model decide which tools to call.

Here's the problem. A moderately complex MCP setup with 20–30 tools can easily consume 10,000–40,000 tokens just in tool definitions before your agent processes a single piece of user input. Then consider what happens during execution:

Every intermediate result from a tool call passes through the model context
Multi-step workflows chain these results, stacking token consumption
A 2-hour meeting transcript processed by a transcription + summarization + CRM-write workflow could push an additional 50,000 tokens just from intermediate data passing through the model

This is what Anthropic calls the direct tool-calling syntax problem: the model sees everything, which is safe but expensive.

Here's a concrete before-and-after:

Standard MCP agent (direct tool calling):
- Tool definitions loaded: ~15,000 tokens
- 3 tool calls with intermediate results: ~40,000 tokens
- Final synthesis: ~10,000 tokens
Total: ~65,000–150,000 tokens per run

Code execution with MCP:
- Tool discovery via code (lazy loading): ~200 tokens
- Filtered intermediate results: ~800 tokens
- Final synthesis: ~1,000 tokens
Total: ~2,000 tokens per run

How Code Execution with MCP Works

Instead of exposing all MCP tools as first-class context entries, the code execution pattern treats your MCP servers as code APIs. Claude writes and executes code (in a sandboxed TypeScript or Python runtime) that interacts with MCP servers programmatically.

The three core ideas:

1. Lazy Tool Loading — Load Only What You Need

Instead of injecting all tool definitions into the system prompt, you give Claude access to a code execution environment and a filesystem-based tool catalog. Claude explores the catalog in code:

typescript// Claude writes this code to discover relevant tools
import { listMcpServers } from './mcp-registry';

const servers = await listMcpServers();
// Returns: ['slack', 'notion', 'github', 'linear']

// Claude only imports the tool definitions it actually needs
const { searchIssues, createPR } = await import('./tools/github');
const { sendMessage } = await import('./tools/slack');

Instead of 30 tool definitions loaded upfront (~30,000 tokens), Claude loads 3 targeted definitions (~300 tokens). The catalog stays on the filesystem, not in the model's context.

2. Intermediate Results Stay in the Execution Environment

In standard MCP, every tool output flows back through the model. With code execution, intermediate results live in memory or the filesystem inside the sandbox — Claude only reads what it explicitly returns or logs.

typescript// Standard approach — ALL of this flows through the model:
const transcript = await transcribe(meetingRecording); // 50K tokens
const summary = await summarize(transcript);            // 50K + 2K tokens
await writeToCRM(summary);                              // 52K + 500 tokens

// Code execution approach — only final output surfaces:
async function processMeeting(recordingPath: string) {
  const transcript = await transcribe(recordingPath);   // stays in sandbox
  const summary = await summarize(transcript);           // stays in sandbox
  await writeToCRM(summary);
  return `CRM updated with meeting summary (${summary.wordCount} words)`;
  // Model only sees this 10-token return value
}

The 50,000-token transcript never enters the model context. Claude gets the result it needs — a confirmation string — without processing every word of the source data.

3. Complex Logic in a Single Agent Step

With standard MCP, multi-condition branching requires multiple model turns (check condition → model decides → call tool → model decides → call next tool). With code execution, that branching lives in the code:

typescript// Multi-step conditional logic in one agent execution
const issues = await github.searchIssues({ label: 'urgent', state: 'open' });

if (issues.length > 10) {
  await slack.sendMessage('#oncall', `🚨 ${issues.length} urgent issues need triage`);
  await linear.createTask({ title: 'Urgent issue triage', priority: 'high' });
} else {
  await notion.updateStatus('issue-tracker', { urgentCount: issues.length });
}

return `Processed ${issues.length} urgent issues`;

This replaces 4–5 model turns with a single code execution step. Each saved turn is thousands of tokens.

Step-by-Step: Implementing the Pattern

Here's how to restructure an existing Claude + MCP agent to use code execution.

Step 1: Set Up a Sandboxed Execution Environment

You need a runtime where Claude can execute code safely. Popular options in 2026:

Cloudflare Workers — serverless, fast cold starts, good for simple tasks
Modal — powerful for compute-heavy workflows, great Python support
Vercel Edge Functions — ideal if you're already on the Vercel stack
Daytona — Anthropic-recommended for self-hosted sandboxes

If you're using Claude Managed Agents on the Claude Platform, self-hosted sandboxes are now in public beta — you can point the agent's execution environment at your own infrastructure.

Step 2: Create a Filesystem Tool Registry

Instead of loading tool schemas into the system prompt, create a structured directory:

/mcp-tools/
  registry.json          # index of all servers + tool names
  github/
    searchIssues.ts      # individual tool with schema + implementation
    createPR.ts
  slack/
    sendMessage.ts
  notion/
    updatePage.ts

Each tool file exports its own schema and callable function. The agent loads the registry index (~500 tokens), then imports specific tool files only when needed.

Step 3: Update Your System Prompt

Replace the long tool-injection block with a concise code-execution instruction:

You have access to a code execution environment and an MCP tool registry at /mcp-tools/registry.json.

To use a tool:
1. Read registry.json to find the right server and tool name
2. Import the specific tool from /mcp-tools/{server}/{tool}.ts
3. Execute it in code and return only the relevant output

Keep intermediate data in the execution environment. Only surface final results in your response.

Your system prompt goes from 15,000 tokens to ~200 tokens.

Step 4: Pass the Code Execution Tool to Claude

Using the Claude API, add the code execution capability as a tool:

pythonimport anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    system=SYSTEM_PROMPT,  # The concise one from Step 3
    tools=[
        {
            "type": "computer_20241022",  # or your custom execution tool
            "name": "code_execution",
            "description": "Execute TypeScript/Python code in a sandboxed environment with access to the MCP tool registry"
        }
    ],
    messages=[{"role": "user", "content": user_input}]
)

With Claude Opus 4.8, dynamic workflows are on by default for Max and Team plan users, which means the model already knows how to sequence code execution steps without explicit chaining logic on your end.

Real-World Token Savings by Use Case

Based on Anthropic's benchmarks and community reports, here's what teams are seeing:

Workflow	Before (tokens)	After (tokens)	Reduction
Meeting transcript → CRM	~150,000	~2,000	98.7%
GitHub PR review + Slack notify	~45,000	~1,200	97.3%
Multi-source research aggregation	~80,000	~3,500	95.6%
Daily standup data collection	~30,000	~900	97.0%

At Claude Opus 4.8 pricing ($15/M input tokens), a workflow at 150K tokens costs ~$2.25 per run. At 2K tokens, that's ~$0.03. For a team running 500 workflows per day, that's $1,125/day vs $15/day — roughly $405,000 saved per year at scale.

When NOT to Use Code Execution with MCP

This pattern is powerful, but not universal. Stick with direct tool calling when:

You need the model to reason about intermediate results — if Claude needs to evaluate complex output (like dense medical data) before deciding the next step, forcing it into code execution may reduce quality
Low-tool-count scenarios — if your agent uses only 2–3 tools, the overhead of setting up code execution probably isn't worth it
Simple request-response flows — a Q&A bot with one knowledge-base tool doesn't need this optimization
Debugging and observability are critical — with intermediate results hidden in the sandbox, debugging agent failures requires more instrumentation

For production use, a good rule of thumb: if your agent regularly makes 5+ tool calls per run or processes large documents/data as intermediate steps, code execution will pay off significantly.

Getting Started with CCA Exam Preparation

The code execution with MCP pattern is core knowledge for the Claude Certified Architect (CCA-F) exam. Anthropic's certification tests your understanding of:

Token efficiency patterns in agentic systems
MCP server architecture and integration strategies
Production deployment considerations for Claude agents
Tool use vs. code execution trade-offs

If you're preparing for the CCA exam, our practice test bank includes 50+ questions on agentic architecture patterns including this exact MCP optimization topic — the kind of nuanced decision-making the exam rewards.

Key Takeaways

Standard MCP agents load all tool definitions into context, which can cost 15,000–40,000 tokens before any work starts
Code execution with MCP shifts tool loading and intermediate data processing into a sandboxed runtime — the model only sees final results
Anthropic's own benchmark: a 150,000-token workflow reduced to 2,000 tokens (98.7% cut)
Three techniques: lazy tool loading via a filesystem registry, keeping intermediates in the sandbox, and collapsing multi-turn branching into single code steps
Best suited for agents with 5+ tools or large intermediate data; not necessary for simple flows
This pattern aligns with Claude Opus 4.8's dynamic workflows feature, which orchestrates code execution steps automatically

Next Steps

Read Anthropic's original engineering post on code execution with MCP for the full technical spec

Audit your current agent — check how many tokens your tool definitions consume and whether intermediate results are flowing through the model

Start small — pick one high-token workflow and rewrite it with a filesystem tool registry and sandboxed execution

Prepare for CCA — if you're building Claude agents professionally, the CCA-F certification validates these architecture skills. Try our free CCA practice questions to see where you stand

The MCP ecosystem is moving fast. Teams that learn to build token-efficient agents now will have a significant cost and performance advantage as agentic workloads scale.

Claude MCP Code Execution: Cut Agent Token Usage by 98% (Anthropic Pattern)