Building an AI Code Review Bot with the Claude API

After months of manually reviewing PRs across multiple repositories, I built an AI-powered code review bot that catches bugs, suggests improvements, and saves our team hours every week. Here’s how.

Why Automate Code Reviews?

Code review is one of the highest-leverage activities in software development — but it’s also one of the biggest bottlenecks. Developers context-switch from deep work to review someone else’s code, often taking hours to get back into flow. An AI reviewer doesn’t replace human judgment, but it handles the mechanical parts: catching common bugs, flagging style inconsistencies, and surfacing potential issues before a human ever looks at the PR.

The Architecture

The bot runs as a GitHub Action triggered on pull_request events:

Fetch the diff — Get the changed files from the GitHub API
Build context — For each changed file, include surrounding code for context
Send to Claude — Use structured outputs to get consistent review format
Post comments — Add inline review comments on the PR

Setting Up the Claude API

First, install the Anthropic SDK:

npm install @anthropic-ai/sdk

The core review function sends each file’s diff along with its full content for context:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function reviewFile(diff: string, fileContent: string, filePath: string) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-5-20250514",
    max_tokens: 2048,
    messages: [
      {
        role: "user",
        content: `Review this code change in ${filePath}.

Full file content:
\`\`\`
${fileContent}
\`\`\`

Changes (diff):
\`\`\`diff
${diff}
\`\`\`

Identify: bugs, security issues, performance problems, and readability improvements.
For each issue, specify the line number and severity (critical/warning/suggestion).`,
      },
    ],
  });

  return response;
}

Structured Outputs for Consistent Reviews

The key to making this production-ready is structured outputs. Instead of parsing free-form text, we tell Claude exactly what format to return:

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 2048,
  messages: [{ role: "user", content: reviewPrompt }],
  tool_use: {
    name: "submit_review",
    description: "Submit structured code review comments",
    input_schema: {
      type: "object",
      properties: {
        comments: {
          type: "array",
          items: {
            type: "object",
            properties: {
              line: { type: "number" },
              severity: { enum: ["critical", "warning", "suggestion"] },
              message: { type: "string" },
            },
          },
        },
        summary: { type: "string" },
      },
    },
  },
});

This guarantees we get a parseable array of comments with line numbers, which we can map directly to GitHub’s review comment API.

Cost Optimization with Prompt Caching

The system prompt and review guidelines stay the same across every review. Using prompt caching, we avoid re-processing these tokens on every API call:

Without caching: ~$0.15 per PR review
With caching: ~$0.06 per PR review (60% reduction)

For a team doing 20 PRs/day, that’s the difference between $90/month and $36/month.

Results After 3 Months

Catches ~30% of issues before human review
Review turnaround dropped from 4 hours average to under 10 minutes for initial feedback
False positive rate is around 15% — low enough that developers trust the bot’s suggestions
Human reviewers can focus on architecture and design decisions instead of nitpicking style

What I Learned

Context is everything — Sending just the diff produces mediocre reviews. Including the full file (or even related files) dramatically improves quality.
Sonnet is the sweet spot — Opus produces marginally better reviews but at 5x the cost. For code review volume, Sonnet is the right tradeoff.
Structured outputs are non-negotiable — Free-form responses are inconsistent and hard to parse into GitHub comments.
Don’t try to catch everything — Configure the bot for high-confidence issues only. A noisy bot gets ignored.