Building an AI Code Review Bot with the Claude API
After months of manually reviewing PRs across multiple repositories, I built an AI-powered code review bot that catches bugs, suggests improvements, and saves our team hours every week. Here’s how.
Why Automate Code Reviews?
Code review is one of the highest-leverage activities in software development — but it’s also one of the biggest bottlenecks. Developers context-switch from deep work to review someone else’s code, often taking hours to get back into flow. An AI reviewer doesn’t replace human judgment, but it handles the mechanical parts: catching common bugs, flagging style inconsistencies, and surfacing potential issues before a human ever looks at the PR.
The Architecture
The bot runs as a GitHub Action triggered on pull_request events:
- Fetch the diff — Get the changed files from the GitHub API
- Build context — For each changed file, include surrounding code for context
- Send to Claude — Use structured outputs to get consistent review format
- Post comments — Add inline review comments on the PR
Setting Up the Claude API
First, install the Anthropic SDK:
npm install @anthropic-ai/sdk
The core review function sends each file’s diff along with its full content for context:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function reviewFile(diff: string, fileContent: string, filePath: string) {
const response = await client.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 2048,
messages: [
{
role: "user",
content: `Review this code change in ${filePath}.
Full file content:
\`\`\`
${fileContent}
\`\`\`
Changes (diff):
\`\`\`diff
${diff}
\`\`\`
Identify: bugs, security issues, performance problems, and readability improvements.
For each issue, specify the line number and severity (critical/warning/suggestion).`,
},
],
});
return response;
}
Structured Outputs for Consistent Reviews
The key to making this production-ready is structured outputs. Instead of parsing free-form text, we tell Claude exactly what format to return:
const response = await client.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 2048,
messages: [{ role: "user", content: reviewPrompt }],
tool_use: {
name: "submit_review",
description: "Submit structured code review comments",
input_schema: {
type: "object",
properties: {
comments: {
type: "array",
items: {
type: "object",
properties: {
line: { type: "number" },
severity: { enum: ["critical", "warning", "suggestion"] },
message: { type: "string" },
},
},
},
summary: { type: "string" },
},
},
},
});
This guarantees we get a parseable array of comments with line numbers, which we can map directly to GitHub’s review comment API.
Cost Optimization with Prompt Caching
The system prompt and review guidelines stay the same across every review. Using prompt caching, we avoid re-processing these tokens on every API call:
- Without caching: ~$0.15 per PR review
- With caching: ~$0.06 per PR review (60% reduction)
For a team doing 20 PRs/day, that’s the difference between $90/month and $36/month.
Results After 3 Months
- Catches ~30% of issues before human review
- Review turnaround dropped from 4 hours average to under 10 minutes for initial feedback
- False positive rate is around 15% — low enough that developers trust the bot’s suggestions
- Human reviewers can focus on architecture and design decisions instead of nitpicking style
What I Learned
- Context is everything — Sending just the diff produces mediocre reviews. Including the full file (or even related files) dramatically improves quality.
- Sonnet is the sweet spot — Opus produces marginally better reviews but at 5x the cost. For code review volume, Sonnet is the right tradeoff.
- Structured outputs are non-negotiable — Free-form responses are inconsistent and hard to parse into GitHub comments.
- Don’t try to catch everything — Configure the bot for high-confidence issues only. A noisy bot gets ignored.