How to Build a Claude AI Agent from Scratch (2026 Tutorial)

You've heard the hype around AI agents, but most tutorials skip the fundamentals: what an agent actually is, why tool use is the unlock, and how to move from a simple API call to a system that can reason across multiple steps, use external tools, and complete real-world tasks without handholding.

This guide is the practical tutorial you wish you had when you started. By the end, you'll have built a working Claude agent from scratch — one that can search the web, read files, and chain actions together to complete goals.

What Is a Claude AI Agent (and How Is It Different from a Chatbot)?

A chatbot generates a response. An agent executes a plan.

The distinction comes down to the action loop:

Feature	Chatbot	Agent
Generates text responses	✅	✅
Uses external tools	❌	✅
Plans across multiple steps	❌	✅
Observes results and adapts	❌	✅
Manages memory/state	❌	✅ (usually)

A Claude agent follows the ReAct pattern (Reason + Act): it reasons about what to do, takes an action (a tool call), observes the result, and loops until the task is complete. This loop is what separates agents from single-turn AI.

Claude is exceptionally well-suited for agents because it:

Has a 1M-token context window for long reasoning chains
Natively supports tool use (structured function calling)
Excels at following complex multi-step instructions reliably
Is less prone to hallucinating tool arguments than competing models

Prerequisites

You'll need:

A free Anthropic Console account and API key
Node.js 18+ (or Python 3.10+)
Basic familiarity with API calls

Install the Anthropic SDK:

bashnpm install @anthropic-ai/sdk
# or
pip install anthropic

Set your API key:

bashexport ANTHROPIC_API_KEY="sk-ant-..."

Step 1: The Simplest Agent — One Tool, One Loop

Every agent starts with the same core: a system prompt, a set of tools, and a message loop.

Here's the minimal TypeScript agent that can look up current weather:

typescriptimport Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Define the tool Claude can use
const tools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description:
      "Get the current weather for a city. Returns temperature and conditions.",
    input_schema: {
      type: "object" as const,
      properties: {
        city: {
          type: "string",
          description: "The city name, e.g. 'San Francisco'",
        },
      },
      required: ["city"],
    },
  },
];

// Mock tool implementation (replace with real API call)
function getWeather(city: string) {
  const mockData: Record<string, string> = {
    "San Francisco": "62°F, partly cloudy",
    "New York": "71°F, sunny",
    London: "55°F, overcast",
  };
  return mockData[city] ?? "Weather data unavailable for this city.";
}

// The agent loop
async function runAgent(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];

  console.log(`User: ${userMessage}\n`);

  while (true) {
    const response = await client.messages.create({
      model: "claude-opus-4-6",
      max_tokens: 1024,
      tools,
      messages,
    });

    // If Claude decides to use a tool
    if (response.stop_reason === "tool_use") {
      const toolUseBlock = response.content.find(
        (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
      );

      if (!toolUseBlock) break;

      console.log(`Claude is calling: ${toolUseBlock.name}`);
      console.log(`With input: ${JSON.stringify(toolUseBlock.input)}\n`);

      // Execute the tool
      const toolResult =
        toolUseBlock.name === "get_weather"
          ? getWeather((toolUseBlock.input as { city: string }).city)
          : "Tool not found";

      console.log(`Tool result: ${toolResult}\n`);

      // Add the assistant's tool call + result to the conversation
      messages.push({ role: "assistant", content: response.content });
      messages.push({
        role: "user",
        content: [
          {
            type: "tool_result",
            tool_use_id: toolUseBlock.id,
            content: toolResult,
          },
        ],
      });

      // Loop again — Claude will now process the tool result
    } else {
      // Claude is done — extract and print the final answer
      const textBlock = response.content.find(
        (block): block is Anthropic.TextBlock => block.type === "text"
      );
      console.log(`Claude: ${textBlock?.text}`);
      break;
    }
  }
}

// Run it
runAgent("What's the weather like in San Francisco and London today?");

Run this and you'll see Claude:

Recognize it needs weather data

Call get_weather for San Francisco

Call get_weather for London

Synthesize both results into a natural response

This loop — call tools until done, then respond — is the foundation of every agent you'll ever build.

Step 2: Multiple Tools and Real-World Tool Chaining

Real agents need multiple tools. Here's a research agent that can search the web, read URLs, and summarize findings:

typescriptconst researchTools: Anthropic.Tool[] = [
  {
    name: "web_search",
    description: "Search the web for information on a topic. Returns top results.",
    input_schema: {
      type: "object" as const,
      properties: {
        query: { type: "string", description: "The search query" },
        num_results: {
          type: "number",
          description: "Number of results to return (default 5)",
        },
      },
      required: ["query"],
    },
  },
  {
    name: "read_url",
    description: "Fetch and read the content of a URL.",
    input_schema: {
      type: "object" as const,
      properties: {
        url: { type: "string", description: "The URL to fetch" },
      },
      required: ["url"],
    },
  },
  {
    name: "write_file",
    description: "Write content to a local file.",
    input_schema: {
      type: "object" as const,
      properties: {
        filename: { type: "string", description: "The file path to write to" },
        content: { type: "string", description: "The content to write" },
      },
      required: ["filename", "content"],
    },
  },
];

The tool executor function should be a clean dispatch map:

typescriptasync function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case "web_search":
      return await searchWeb(input.query as string, (input.num_results as number) ?? 5);
    case "read_url":
      return await fetchUrl(input.url as string);
    case "write_file":
      return await writeFile(input.filename as string, input.content as string);
    default:
      throw new Error(`Unknown tool: ${name}`);
  }
}

Now give the agent a goal like: "Research the top 3 open-source LLM frameworks in 2026, compare them, and save a summary to research.md" — and watch it search, read multiple pages, synthesize, and save the file, all autonomously.

Step 3: Adding Memory to Your Agent

By default, Claude agents are stateless — every call starts fresh. For agents that need to remember user preferences, past actions, or accumulated knowledge, you need to implement memory explicitly.

There are three patterns:

Pattern 1: In-Context Memory (Simplest)

Just inject a summary of past interactions into the system prompt:

typescriptconst systemPrompt = `You are a helpful research assistant.

MEMORY (from previous sessions):
${userMemory.join("\n")}

When you learn something important about the user or task, 
say "REMEMBER: [fact]" and it will be saved for next time.`;

Parse the REMEMBER: tags from responses and persist them to a database or file.

Pattern 2: Semantic Memory with Embeddings

For larger memory stores, embed facts and retrieve the top-k most relevant ones at query time:

typescript// On each user message:
const relevantMemories = await vectorDB.search(userMessage, { topK: 5 });
const memoryContext = relevantMemories.map((m) => m.text).join("\n");

// Inject only the relevant memories into the system prompt

This scales to thousands of memories without bloating the context window.

Pattern 3: Tool-Based Memory

Give Claude tools like save_memory(key, value) and recall_memory(key) — let it decide what to remember and when. This is the most agent-native approach and often produces the best results.

Step 4: System Prompt Engineering for Agents

The system prompt is your agent's operating manual. For production agents, it needs to be explicit about:

You are [agent name], a [role] for [company/purpose].

CAPABILITIES:
- You have access to these tools: [list with one-line descriptions]

BEHAVIOR RULES:
1. Always think step-by-step before acting
2. Use tools only when necessary — don't call a tool if you already know the answer
3. If a tool fails, try an alternative approach before giving up
4. Never make up data — if you don't have it, say so

TASK FORMAT:
When given a complex task:
1. Restate the goal in your own words
2. List the steps you plan to take
3. Execute, checking results at each step
4. Summarize what you did and what you found

LIMITS:
- Do not write to any path outside of /output/
- Do not make more than 10 tool calls per task
- Stop and ask for clarification if the task is ambiguous

Well-defined behavior rules dramatically reduce hallucinations and unexpected behavior in production agents.

Step 5: Handling Errors and Edge Cases

Agents fail in predictable ways. Build defensive handling from the start:

typescriptasync function robustAgentLoop(
  userMessage: string,
  maxIterations = 15
): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
  let iterations = 0;

  while (iterations < maxIterations) {
    iterations++;

    try {
      const response = await client.messages.create({
        model: "claude-opus-4-6",
        max_tokens: 4096,
        tools,
        messages,
      });

      if (response.stop_reason === "max_tokens") {
        // Claude ran out of tokens mid-response — increase max_tokens or summarize context
        console.warn("Hit token limit — consider summarizing the conversation");
        break;
      }

      if (response.stop_reason === "tool_use") {
        // Process all tool calls in this response (Claude may call multiple tools at once)
        const toolUseBlocks = response.content.filter(
          (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
        );

        const toolResults = await Promise.all(
          toolUseBlocks.map(async (block) => {
            try {
              const result = await executeTool(
                block.name,
                block.input as Record<string, unknown>
              );
              return {
                type: "tool_result" as const,
                tool_use_id: block.id,
                content: result,
              };
            } catch (err) {
              // Return error as tool result — let Claude decide what to do
              return {
                type: "tool_result" as const,
                tool_use_id: block.id,
                content: `Error: ${err instanceof Error ? err.message : "Unknown error"}`,
                is_error: true,
              };
            }
          })
        );

        messages.push({ role: "assistant", content: response.content });
        messages.push({ role: "user", content: toolResults });
      } else {
        // end_turn — extract final response
        const textBlock = response.content.find(
          (block): block is Anthropic.TextBlock => block.type === "text"
        );
        return textBlock?.text ?? "No response generated.";
      }
    } catch (err) {
      // API error — could be rate limit, network issue, etc.
      if ((err as { status?: number }).status === 429) {
        console.log("Rate limited — waiting 30s...");
        await new Promise((resolve) => setTimeout(resolve, 30000));
      } else {
        throw err;
      }
    }
  }

  return "Agent reached maximum iterations without completing the task.";
}

Key defensive practices:

Max iteration guard — prevents infinite loops if a tool is broken
Tool errors returned as results — Claude can recover and try alternatives
Rate limit retry — essential for production agents
Promise.all for parallel tool calls — Claude often calls multiple tools in one turn; execute them in parallel

Common Agent Patterns and When to Use Each

Pattern	Use Case	Complexity
Single-agent + tools	Most tasks: research, coding, data analysis	Low
Manager + subagents	Tasks that need parallel execution or specialization	Medium
Pipeline agents	Sequential processing: scrape → clean → analyze → report	Medium
Self-reflection loops	High-stakes output that needs quality checking	Medium
Human-in-the-loop	Sensitive actions (payments, emails, deletions)	High

For most use cases, start with a single agent with 3-5 well-designed tools. Add complexity only when you hit a clear limitation.

Best Practices for Production Claude Agents

After building and deploying agents for real workloads, these are the rules that matter most:

Log every tool call — You need an audit trail. Log (tool_name, input, output, latency, error) to a database for every production run.

Use structured output for critical data — If your agent extracts data you'll process programmatically, use a return_result tool with a typed schema instead of parsing free text.

Set tight tool descriptions — Ambiguous tool descriptions lead to wrong tool calls. Each description should explain both what the tool does AND when NOT to use it.

Test with adversarial inputs — Ask the agent to do things it shouldn't. Make sure it refuses appropriately. Prompt injection is real.

Cap costs per run — Implement a max_tokens_per_run budget. Runaway agents can be expensive. A typical research task should use under 50K tokens.

Use claude-haiku-4-5 for tool selection, claude-opus-4-6 for synthesis — Route lighter reasoning steps to faster/cheaper models and reserve the flagship model for complex synthesis.

What's Next: Claude Certified Architect

If you're building production agents with Claude — or want to validate your expertise formally — the Claude Certified Architect (CCA) certification is the industry credential that proves you can design, build, and deploy Claude-based systems correctly.

The exam covers:

Tool use and agent architecture (exactly what you learned here)
Prompt engineering and context management
Safety, trust boundaries, and responsible deployment
API design, rate limits, and cost optimization

Practice tests that mirror the real exam format are available at AI for Anything — 200+ questions with detailed explanations written by Claude practitioners.

Key Takeaways

A Claude agent = the ReAct loop: reason → call tool → observe result → repeat
Tool definitions are the most important part of your agent — write them precisely
Start with a single agent and 3-5 tools; add complexity only when needed
Implement max iteration guards, error handling, and cost caps from day one
Memory is not built-in — implement it explicitly (in-context, vector, or tool-based)
System prompt engineering for agents requires explicit behavior rules and limits

The gap between a chatbot demo and a production agent is mostly engineering discipline: good tool design, defensive error handling, and clear behavioral guardrails. Claude gives you the intelligence — your job is to wire it up correctly.

Ready to test your Claude knowledge? Try our free CCA practice quiz — no account required.