TL;DR

On April 16, 2026, Anthropic shipped /ultrareview in Claude Code v2.1.111 alongside Opus 4.7. Type it and a fleet of 5 sub-agents (up to 20) runs in a remote sandbox across 4 stages — Setup, Find, Verify, Dedup — and returns only verified bugs. Anthropic reports fewer than 1% of findings are marked incorrect by engineers. Pro and Max users get 3 one-time free runs; after that it's ~$5–$25 per review. It's not magic, and it's not for every commit — but for 1,000+ line PRs touching auth, payments, or concurrency, it's the first AI review that's actually worth waiting 15 minutes for.

What's new

/ultrareview is a new slash command in Claude Code. Run it with no argument to review your current branch, or /ultrareview 1234 to pull a specific GitHub PR. Unlike the existing /review, which does one local pass in your session, /ultrareview ships your diff to Anthropic's cloud infrastructure and spawns an adversarial multi-agent fleet that runs in the background while you keep coding.

The feature was first discovered on March 31, 2026 when a 59.8MB source map accidentally shipped inside npm @anthropic-ai/claude-code v2.1.88, leaking 1,884 TypeScript source files. Anthropic made it official two weeks later.

How it works: the 4 stages

  1. Setup — Provisions the cloud sandbox, spins up the sub-agent fleet. Default 5 agents; up to 20 for larger risks. ~90 seconds on an 11,000-line PR.
  2. Find — Each agent gets its own context window and explores different execution paths. Opus-class agents hunt logic and security bugs; Sonnet-class agents handle style and CLAUDE.md violations. Traversal order matters — agents that start from auth surface race conditions invisible to agents starting from the UI.
  3. Verify — This is the unlock. A separate verification agent tries to reproduce each candidate bug. Only confirmed findings survive. The Find agents are tuned for recall; the Verify agents are tuned for precision. Adversarial by design.
  4. Dedup — Merges the same bug reported from different angles into one enriched finding.

Technical facts

PropertyValue
Default fleet size5 sub-agents
Max fleet size20 sub-agents
StagesSetup → Find → Verify → Dedup
Typical runtime5–10 min (up to ~20 min on large PRs)
Observed 11k-line PR17 minutes total, 64 candidates filtered down
False positive rate<1% (Anthropic internal)
Cost per run~$5–$25 (≈3M tokens)
Min versionClaude Code v2.1.111

Anthropic has used the system internally for months. Substantive review comments rose from 16% to 54% of PRs after adoption. On PRs with more than 1,000 lines changed, 84% generated findings, averaging 7.5 issues per PR. For PRs under 50 lines, only 31% generated findings (avg 0.5 issues) — confirming the tool pays off on big, risky diffs, not one-line tweaks.

/ultrareview vs /review

Axis/review/ultrareview
Runtime environmentLocal sessionCloud sandbox
Agents1 pass, one context5–20 parallel, separate contexts
VerificationNone — you triageAdversarial verify stage
Runtime3–4 min5–20 min
Blocks you?YesNo — runs in background
CostNormal session tokens~$5–$25 after 3 free runs
Best forFast feedback, small diffsLarge refactors, auth, payments, concurrency

Use cases: when to actually fire it

  • Large refactors — 1,000+ line diffs where a human reviewer would zone out after screen 3.
  • Security-sensitive code — auth flows, encryption, token handling, permission checks.
  • Concurrency and async logic — race conditions that only appear when you read the code in a specific order.
  • Database migrations — irreversible changes where a missed null check turns into a 3 AM page.
  • Payment and billing refactors — the kind of bug that costs real money to discover in production.
  • Long-lived feature branches — where drift has accumulated and standard review misses cross-file inconsistencies.

Don't waste a run on a typo fix or a README tweak. The math doesn't work below ~50 lines of diff.

Limitations & pricing

  • Free quota: Pro and Max plans get 3 one-time free runs — they do not reset. Team and Enterprise get none free.
  • Paid: ~$5–$25 per run billed as extra usage after the free quota.
  • Plan gating: Full Code Review research preview is on Team, Enterprise, and Max 20x ($200/mo). Pro ($20/mo) and Max 5x ($100/mo) can invoke /ultrareview but the broader auto-on-PR Code Review is still limited.
  • Not supported on: Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and organizations with Zero Data Retention enabled.
  • Server-side flags gate rollout — even if you meet every requirement, it may not be on for you yet.
  • Doesn't replace architectural judgment. It finds bugs, not bad product decisions. Humans still own "should we use WebRTC or WebSockets" calls.

What's next

On April 23, 2026, Anthropic flips Enterprise and pay-as-you-go API defaults from Opus 4.6 to 4.7, which likely expands /ultrareview's reach as the underlying model rolls further. Enterprise teams are expected to get priority access to the 20-agent fleet first, for maximum-coverage runs on infra and security rewrites.

The deeper prediction worth tracking: the find-verify-dedup pipeline — specifically the verification stage — is the architectural pattern that makes multi-agent review trustworthy. Once developers feel the difference between "here are possible bugs" and "here are confirmed bugs with repro", expect every competing review tool (Copilot, CodeRabbit, Graphite) to ship the same pattern within 12 months.

Sources: InfoQ, Claude Code changelog, Engr Mejba Ahmed, Tech2Geek, Techsy.