How to Build a Claude AI Agent from Scratch (2026 Tutorial)
Step-by-step guide to building Claude AI agents with tool use, memory, and multi-step reasoning. Includes working code examples and best practices.
How to Build a Claude AI Agent from Scratch (2026 Tutorial)
You've heard the hype around AI agents, but most tutorials skip the fundamentals: what an agent actually is, why tool use is the unlock, and how to move from a simple API call to a system that can reason across multiple steps, use external tools, and complete real-world tasks without handholding.
This guide is the practical tutorial you wish you had when you started. By the end, you'll have built a working Claude agent from scratch — one that can search the web, read files, and chain actions together to complete goals.
What Is a Claude AI Agent (and How Is It Different from a Chatbot)?
A chatbot generates a response. An agent executes a plan.
The distinction comes down to the action loop:
| Feature | Chatbot | Agent |
|---|---|---|
| Generates text responses | ✅ | ✅ |
| Uses external tools | ❌ | ✅ |
| Plans across multiple steps | ❌ | ✅ |
| Observes results and adapts | ❌ | ✅ |
| Manages memory/state | ❌ | ✅ (usually) |
A Claude agent follows the ReAct pattern (Reason + Act): it reasons about what to do, takes an action (a tool call), observes the result, and loops until the task is complete. This loop is what separates agents from single-turn AI.
Claude is exceptionally well-suited for agents because it:
- Has a 1M-token context window for long reasoning chains
- Natively supports tool use (structured function calling)
- Excels at following complex multi-step instructions reliably
- Is less prone to hallucinating tool arguments than competing models
Prerequisites
You'll need:
- A free Anthropic Console account and API key
- Node.js 18+ (or Python 3.10+)
- Basic familiarity with API calls
Install the Anthropic SDK:
bashnpm install @anthropic-ai/sdk
# or
pip install anthropicSet your API key:
bashexport ANTHROPIC_API_KEY="sk-ant-..."Step 1: The Simplest Agent — One Tool, One Loop
Every agent starts with the same core: a system prompt, a set of tools, and a message loop.
Here's the minimal TypeScript agent that can look up current weather:
typescriptimport Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Define the tool Claude can use
const tools: Anthropic.Tool[] = [
{
name: "get_weather",
description:
"Get the current weather for a city. Returns temperature and conditions.",
input_schema: {
type: "object" as const,
properties: {
city: {
type: "string",
description: "The city name, e.g. 'San Francisco'",
},
},
required: ["city"],
},
},
];
// Mock tool implementation (replace with real API call)
function getWeather(city: string) {
const mockData: Record<string, string> = {
"San Francisco": "62°F, partly cloudy",
"New York": "71°F, sunny",
London: "55°F, overcast",
};
return mockData[city] ?? "Weather data unavailable for this city.";
}
// The agent loop
async function runAgent(userMessage: string) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userMessage },
];
console.log(`User: ${userMessage}\n`);
while (true) {
const response = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 1024,
tools,
messages,
});
// If Claude decides to use a tool
if (response.stop_reason === "tool_use") {
const toolUseBlock = response.content.find(
(block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
);
if (!toolUseBlock) break;
console.log(`Claude is calling: ${toolUseBlock.name}`);
console.log(`With input: ${JSON.stringify(toolUseBlock.input)}\n`);
// Execute the tool
const toolResult =
toolUseBlock.name === "get_weather"
? getWeather((toolUseBlock.input as { city: string }).city)
: "Tool not found";
console.log(`Tool result: ${toolResult}\n`);
// Add the assistant's tool call + result to the conversation
messages.push({ role: "assistant", content: response.content });
messages.push({
role: "user",
content: [
{
type: "tool_result",
tool_use_id: toolUseBlock.id,
content: toolResult,
},
],
});
// Loop again — Claude will now process the tool result
} else {
// Claude is done — extract and print the final answer
const textBlock = response.content.find(
(block): block is Anthropic.TextBlock => block.type === "text"
);
console.log(`Claude: ${textBlock?.text}`);
break;
}
}
}
// Run it
runAgent("What's the weather like in San Francisco and London today?");Run this and you'll see Claude:
get_weather for San Franciscoget_weather for LondonStep 2: Multiple Tools and Real-World Tool Chaining
Real agents need multiple tools. Here's a research agent that can search the web, read URLs, and summarize findings:
typescriptconst researchTools: Anthropic.Tool[] = [
{
name: "web_search",
description: "Search the web for information on a topic. Returns top results.",
input_schema: {
type: "object" as const,
properties: {
query: { type: "string", description: "The search query" },
num_results: {
type: "number",
description: "Number of results to return (default 5)",
},
},
required: ["query"],
},
},
{
name: "read_url",
description: "Fetch and read the content of a URL.",
input_schema: {
type: "object" as const,
properties: {
url: { type: "string", description: "The URL to fetch" },
},
required: ["url"],
},
},
{
name: "write_file",
description: "Write content to a local file.",
input_schema: {
type: "object" as const,
properties: {
filename: { type: "string", description: "The file path to write to" },
content: { type: "string", description: "The content to write" },
},
required: ["filename", "content"],
},
},
];The tool executor function should be a clean dispatch map:
typescriptasync function executeTool(
name: string,
input: Record<string, unknown>
): Promise<string> {
switch (name) {
case "web_search":
return await searchWeb(input.query as string, (input.num_results as number) ?? 5);
case "read_url":
return await fetchUrl(input.url as string);
case "write_file":
return await writeFile(input.filename as string, input.content as string);
default:
throw new Error(`Unknown tool: ${name}`);
}
}Now give the agent a goal like: "Research the top 3 open-source LLM frameworks in 2026, compare them, and save a summary to research.md" — and watch it search, read multiple pages, synthesize, and save the file, all autonomously.
Step 3: Adding Memory to Your Agent
By default, Claude agents are stateless — every call starts fresh. For agents that need to remember user preferences, past actions, or accumulated knowledge, you need to implement memory explicitly.
There are three patterns:
Pattern 1: In-Context Memory (Simplest)
Just inject a summary of past interactions into the system prompt:
typescriptconst systemPrompt = `You are a helpful research assistant.
MEMORY (from previous sessions):
${userMemory.join("\n")}
When you learn something important about the user or task,
say "REMEMBER: [fact]" and it will be saved for next time.`;Parse the REMEMBER: tags from responses and persist them to a database or file.
Pattern 2: Semantic Memory with Embeddings
For larger memory stores, embed facts and retrieve the top-k most relevant ones at query time:
typescript// On each user message:
const relevantMemories = await vectorDB.search(userMessage, { topK: 5 });
const memoryContext = relevantMemories.map((m) => m.text).join("\n");
// Inject only the relevant memories into the system promptThis scales to thousands of memories without bloating the context window.
Pattern 3: Tool-Based Memory
Give Claude tools like save_memory(key, value) and recall_memory(key) — let it decide what to remember and when. This is the most agent-native approach and often produces the best results.
Step 4: System Prompt Engineering for Agents
The system prompt is your agent's operating manual. For production agents, it needs to be explicit about:
You are [agent name], a [role] for [company/purpose].
CAPABILITIES:
- You have access to these tools: [list with one-line descriptions]
BEHAVIOR RULES:
1. Always think step-by-step before acting
2. Use tools only when necessary — don't call a tool if you already know the answer
3. If a tool fails, try an alternative approach before giving up
4. Never make up data — if you don't have it, say so
TASK FORMAT:
When given a complex task:
1. Restate the goal in your own words
2. List the steps you plan to take
3. Execute, checking results at each step
4. Summarize what you did and what you found
LIMITS:
- Do not write to any path outside of /output/
- Do not make more than 10 tool calls per task
- Stop and ask for clarification if the task is ambiguousWell-defined behavior rules dramatically reduce hallucinations and unexpected behavior in production agents.
Step 5: Handling Errors and Edge Cases
Agents fail in predictable ways. Build defensive handling from the start:
typescriptasync function robustAgentLoop(
userMessage: string,
maxIterations = 15
): Promise<string> {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userMessage },
];
let iterations = 0;
while (iterations < maxIterations) {
iterations++;
try {
const response = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 4096,
tools,
messages,
});
if (response.stop_reason === "max_tokens") {
// Claude ran out of tokens mid-response — increase max_tokens or summarize context
console.warn("Hit token limit — consider summarizing the conversation");
break;
}
if (response.stop_reason === "tool_use") {
// Process all tool calls in this response (Claude may call multiple tools at once)
const toolUseBlocks = response.content.filter(
(block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
);
const toolResults = await Promise.all(
toolUseBlocks.map(async (block) => {
try {
const result = await executeTool(
block.name,
block.input as Record<string, unknown>
);
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: result,
};
} catch (err) {
// Return error as tool result — let Claude decide what to do
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: `Error: ${err instanceof Error ? err.message : "Unknown error"}`,
is_error: true,
};
}
})
);
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: toolResults });
} else {
// end_turn — extract final response
const textBlock = response.content.find(
(block): block is Anthropic.TextBlock => block.type === "text"
);
return textBlock?.text ?? "No response generated.";
}
} catch (err) {
// API error — could be rate limit, network issue, etc.
if ((err as { status?: number }).status === 429) {
console.log("Rate limited — waiting 30s...");
await new Promise((resolve) => setTimeout(resolve, 30000));
} else {
throw err;
}
}
}
return "Agent reached maximum iterations without completing the task.";
}Key defensive practices:
- Max iteration guard — prevents infinite loops if a tool is broken
- Tool errors returned as results — Claude can recover and try alternatives
- Rate limit retry — essential for production agents
Promise.allfor parallel tool calls — Claude often calls multiple tools in one turn; execute them in parallel
Common Agent Patterns and When to Use Each
| Pattern | Use Case | Complexity |
|---|---|---|
| Single-agent + tools | Most tasks: research, coding, data analysis | Low |
| Manager + subagents | Tasks that need parallel execution or specialization | Medium |
| Pipeline agents | Sequential processing: scrape → clean → analyze → report | Medium |
| Self-reflection loops | High-stakes output that needs quality checking | Medium |
| Human-in-the-loop | Sensitive actions (payments, emails, deletions) | High |
For most use cases, start with a single agent with 3-5 well-designed tools. Add complexity only when you hit a clear limitation.
Best Practices for Production Claude Agents
After building and deploying agents for real workloads, these are the rules that matter most:
(tool_name, input, output, latency, error) to a database for every production run.return_result tool with a typed schema instead of parsing free text.max_tokens_per_run budget. Runaway agents can be expensive. A typical research task should use under 50K tokens.claude-haiku-4-5 for tool selection, claude-opus-4-6 for synthesis — Route lighter reasoning steps to faster/cheaper models and reserve the flagship model for complex synthesis.What's Next: Claude Certified Architect
If you're building production agents with Claude — or want to validate your expertise formally — the Claude Certified Architect (CCA) certification is the industry credential that proves you can design, build, and deploy Claude-based systems correctly.
The exam covers:
- Tool use and agent architecture (exactly what you learned here)
- Prompt engineering and context management
- Safety, trust boundaries, and responsible deployment
- API design, rate limits, and cost optimization
Key Takeaways
- A Claude agent = the ReAct loop: reason → call tool → observe result → repeat
- Tool definitions are the most important part of your agent — write them precisely
- Start with a single agent and 3-5 tools; add complexity only when needed
- Implement max iteration guards, error handling, and cost caps from day one
- Memory is not built-in — implement it explicitly (in-context, vector, or tool-based)
- System prompt engineering for agents requires explicit behavior rules and limits
The gap between a chatbot demo and a production agent is mostly engineering discipline: good tool design, defensive error handling, and clear behavioral guardrails. Claude gives you the intelligence — your job is to wire it up correctly.
Ready to test your Claude knowledge? Try our free CCA practice quiz — no account required.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.