Claude Code /ultrareview: 5 Agents Hunt Bugs In The Cloud, Human Triage Drops To 1%

← quay lại timelineArticle thread

Claude Code /ultrareview: 5 Agents Hunt Bugs In The Cloud, Human Triage Drops To 1%

D. Chu

@donniechublog·21 Apr

21 Apr 2026·7 phút đọc

Highlights

Anthropic's new /ultrareview slash command spawns up to 20 parallel agents in a cloud sandbox, runs Find → Verify → Dedup, and returns bugs with under 1% false positives.
Pro/Max get 3 free runs.
Here's what it catches, what it costs, and when to use it over /review.

TL;DR

On April 16, 2026, Anthropic shipped /ultrareview in Claude Code v2.1.111 alongside Opus 4.7. Type it and a fleet of 5 sub-agents (up to 20) runs in a remote sandbox across 4 stages — Setup, Find, Verify, Dedup — and returns only verified bugs. Anthropic reports fewer than 1% of findings are marked incorrect by engineers. Pro and Max users get 3 one-time free runs; after that it's ~$5–$25 per review. It's not magic, and it's not for every commit — but for 1,000+ line PRs touching auth, payments, or concurrency, it's the first AI review that's actually worth waiting 15 minutes for.

What's new

/ultrareview is a new slash command in Claude Code. Run it with no argument to review your current branch, or /ultrareview 1234 to pull a specific GitHub PR. Unlike the existing /review, which does one local pass in your session, /ultrareview ships your diff to Anthropic's cloud infrastructure and spawns an adversarial multi-agent fleet that runs in the background while you keep coding.

The feature was first discovered on March 31, 2026 when a 59.8MB source map accidentally shipped inside npm @anthropic-ai/claude-code v2.1.88, leaking 1,884 TypeScript source files. Anthropic made it official two weeks later.

How it works: the 4 stages

Setup — Provisions the cloud sandbox, spins up the sub-agent fleet. Default 5 agents; up to 20 for larger risks. ~90 seconds on an 11,000-line PR.
Find — Each agent gets its own context window and explores different execution paths. Opus-class agents hunt logic and security bugs; Sonnet-class agents handle style and CLAUDE.md violations. Traversal order matters — agents that start from auth surface race conditions invisible to agents starting from the UI.
Verify — This is the unlock. A separate verification agent tries to reproduce each candidate bug. Only confirmed findings survive. The Find agents are tuned for recall; the Verify agents are tuned for precision. Adversarial by design.
Dedup — Merges the same bug reported from different angles into one enriched finding.

Technical facts

Property	Value
Default fleet size	5 sub-agents
Max fleet size	20 sub-agents
Stages	Setup → Find → Verify → Dedup
Typical runtime	5–10 min (up to ~20 min on large PRs)
Observed 11k-line PR	17 minutes total, 64 candidates filtered down
False positive rate	<1% (Anthropic internal)
Cost per run	~$5–$25 (≈3M tokens)
Min version	Claude Code v2.1.111

Anthropic has used the system internally for months. Substantive review comments rose from 16% to 54% of PRs after adoption. On PRs with more than 1,000 lines changed, 84% generated findings, averaging 7.5 issues per PR. For PRs under 50 lines, only 31% generated findings (avg 0.5 issues) — confirming the tool pays off on big, risky diffs, not one-line tweaks.

/ultrareview vs /review

Axis	/review	/ultrareview
Runtime environment	Local session	Cloud sandbox
Agents	1 pass, one context	5–20 parallel, separate contexts
Verification	None — you triage	Adversarial verify stage
Runtime	3–4 min	5–20 min
Blocks you?	Yes	No — runs in background
Cost	Normal session tokens	~$5–$25 after 3 free runs
Best for	Fast feedback, small diffs	Large refactors, auth, payments, concurrency

Use cases: when to actually fire it

Large refactors — 1,000+ line diffs where a human reviewer would zone out after screen 3.
Security-sensitive code — auth flows, encryption, token handling, permission checks.
Concurrency and async logic — race conditions that only appear when you read the code in a specific order.
Database migrations — irreversible changes where a missed null check turns into a 3 AM page.
Payment and billing refactors — the kind of bug that costs real money to discover in production.
Long-lived feature branches — where drift has accumulated and standard review misses cross-file inconsistencies.

Don't waste a run on a typo fix or a README tweak. The math doesn't work below ~50 lines of diff.

Limitations & pricing

Free quota: Pro and Max plans get 3 one-time free runs — they do not reset. Team and Enterprise get none free.
Paid: ~$5–$25 per run billed as extra usage after the free quota.
Plan gating: Full Code Review research preview is on Team, Enterprise, and Max 20x ($200/mo). Pro ($20/mo) and Max 5x ($100/mo) can invoke /ultrareview but the broader auto-on-PR Code Review is still limited.
Not supported on: Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and organizations with Zero Data Retention enabled.
Server-side flags gate rollout — even if you meet every requirement, it may not be on for you yet.
Doesn't replace architectural judgment. It finds bugs, not bad product decisions. Humans still own "should we use WebRTC or WebSockets" calls.

What's next

On April 23, 2026, Anthropic flips Enterprise and pay-as-you-go API defaults from Opus 4.6 to 4.7, which likely expands /ultrareview's reach as the underlying model rolls further. Enterprise teams are expected to get priority access to the 20-agent fleet first, for maximum-coverage runs on infra and security rewrites.

The deeper prediction worth tracking: the find-verify-dedup pipeline — specifically the verification stage — is the architectural pattern that makes multi-agent review trustworthy. Once developers feel the difference between "here are possible bugs" and "here are confirmed bugs with repro", expect every competing review tool (Copilot, CodeRabbit, Graphite) to ship the same pattern within 12 months.

Sources: InfoQ, Claude Code changelog, Engr Mejba Ahmed, Tech2Geek, Techsy.

Claude Code /ultrareview: 5 Agents Hunt Bugs In The Cloud, Human Triage Drops To 1%

TL;DR

What's new

How it works: the 4 stages

Technical facts

/ultrareview vs /review

Use cases: when to actually fire it

Limitations & pricing

What's next

Tiếp tục lướt

Mind DeepResearch 30B của Li Auto vượt Gemini 3.1 trên benchmark deep research

AI Agent pops a root shell on Ubuntu 26.04 — on day one

OpenClaw v2026.4.24: Google Meet agents, full-agent voice, and DeepSeek V4 land in one release

Orca IDE v1.3.18: Bình luận trực tiếp lên diff, gửi cả review cho AI agent trong một click

CubeSandbox: Tencent vừa open-source nền tảng chạy hàng nghìn AI agent isolation thật trong vài mili-giây