- After weeks of user reports about quality slippage — and a 6,852-session audit from an AMD engineer — Anthropic has shipped Claude Code v2.1.116+ fixing three bugs, reset usage limits for every subscriber, and published a post-mortem.
TL;DR
Anthropic just admitted that Claude Code's quality slipped over the past month and published a post-mortem on three distinct issues. All are fixed in v2.1.116+, and usage limits have been reset for every subscriber as compensation. The confession arrives two weeks after AMD AI Director Stella Laurenzo dropped a 6,852-session dataset on GitHub showing Claude Code's "reads-per-edit" had collapsed 70% — the kind of evidence you cannot wave away.
What's new
The Claude Code team's post-mortem identifies three root causes that compounded into what users experienced as "Claude getting dumber":
- Prompt cache bugs — two separate defects quietly inflating token costs 10–20x.
- Zero-thinking-tokens bug — Extended Thinking set to high could return zero reasoning tokens while still billing the input context.
- Effort-default and peak-hour throttling — the default effort level was silently lowered and weekday morning throttling was introduced without notice.
Claude Code v2.1.116+ ships fixes for all three. Anthropic is also resetting usage limits across Free, Pro, Max, Team, and Enterprise — a goodwill gesture aimed at customers whose sessions burned through quota unfairly during the regression window.
Why it matters
This is the second time in eight months Anthropic has had to explain a quality regression publicly. The September 2025 postmortem covered inference-layer bugs (routing, TPU compiler, approximate top-k). This one is different — it sits in the Claude Code client and in product defaults that Anthropic chose, not in the serving stack. That distinction matters: the first was an accident, this one is a mix of bugs and intentional product decisions that users experienced as degradation.
It also sets a precedent: a single, rigorous user-built measurement suite — Laurenzo's — was enough to force a vendor with billions in funding to issue a post-mortem and write checks in the form of reset limits. Expect more of these audits in 2026.
Technical facts
The four ingredients of the $42,121 bill (up from $345/month) that became the viral poster child for this incident:
| Factor | Cost impact | Status in v2.1.116+ |
|---|---|---|
Session-resume cache miss (--resume/--continue re-ingested full context) | 10–20x | Fixed |
Bun fork string-replace bug in npx @anthropic-ai/claude-code | 10–20x | Fixed |
| Zero-thinking-tokens on Extended Thinking high | Paid for depth, got shallow | Fixed |
Default effort dropped to medium (effort=85) + peak-hour throttling (5–11am PT weekdays) | Session quota burn + shallower reasoning | Default restored to high; peak-hour throttling adjusted |
Some sampling-level fixes from the September 2025 incident — exact top-k over approximate, more operations standardized on fp32 — are already live and stay that way. Quality, as Anthropic put it then, "is non-negotiable."
The AMD report that forced the issue
Stella Laurenzo, Senior Director at AMD's AI Group, instrumented her team's IREE compiler workflows and compared two identical 14-day windows:
- Reads per edit: 6.6x (Jan 30 – Feb 12) → 2.0x (Mar 8 – Mar 23). A 70% drop. Claude stopped reading related files before editing.
- Blind edits (edits to files Claude never read): 6.2% → 33.7%. One in three edits.
- Ownership-dodging stop hook: 0 fires in the good period, 173 fires in 17 days of the degraded period.
- API cost: $345/month → $42,121 (compound of intentional agent scaling from 1–3 to 5–10 concurrent, cache bugs, and retries).
The strongest signal wasn't vibes — it was behavioral. Tool-call traces showed Claude shifted from research-first to edit-first starting February 9, the day adaptive thinking was deployed as default on Opus 4.6.
Comparison with the September 2025 postmortem
| Dimension | Sept 2025 | April 2026 (v2.1.116+) |
|---|---|---|
| Layer | Inference infra | Claude Code client + product defaults |
| Peak impact | 16% of Sonnet 4 traffic at worst hour | ~7% users during peak throttling + all npx users on cache bug |
| Trigger | Routing + TPU + XLA compiler | Cache + thinking-token + effort defaults |
| Detection | Anthropic's internal investigation | External data audit (AMD, 6,852 sessions) |
| Compensation | None | Usage limits reset for all subscribers |
Who benefits most
If you fit any of these, upgrade to v2.1.116+ immediately:
- Heavy Claude Code users on Pro who felt session quotas evaporate on weekday mornings.
- Long agentic coding sessions — cache fixes alone should cut token spend 10–20x for session-resume workflows.
- Extended Thinking power users — the zero-token bug was silently charging for depth you never got.
- Anyone running via
npx— the Bun string-replace bug is finally dead. Standaloneclaudebinary recommended regardless.
Limitations & what's still open
Laurenzo's team reports the behavioral degradation — reads-per-edit, blind edits — persisted even after all published workarounds were applied. That suggests v2.1.116+ addresses the bill-shock and throttling issues cleanly, but the deeper "adaptive thinking" behavioral change (Feb 9) remains a product design question, not a bug. Boris Cherny, Claude Code's creator, confirmed on GitHub that adaptive thinking is intentional. If you run complex engineering work, set effort explicitly rather than trusting the default.
Workarounds still worth pinning to your shell config:
CLAUDE_CODE_EFFORT_LEVEL=max
CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1Also: avoid --resume and --continue for unrelated tasks, use /clear between contexts, and run heavy work off-peak.
What's next
Anthropic's commitments from the September 2025 post-mortem still apply and are doubling down:
- Continuous quality evals on true production traffic, not just synthetic benchmarks.
- Evaluations sensitive enough to separate "working" from "broken" — not just move an average.
- Privacy-aware debugging tooling so engineers can triage community reports without reading raw user data.
- Tighter communication loop with the Claude Code GitHub community.
The deeper lesson is architectural. Laurenzo's team had an entire compiler workflow running through a single model with no fallback. When that model's behavior shifted — bugs or product decisions, doesn't matter — their system broke. Build your own 50-case eval suite. Abstract over providers. Measure before you complain, and you will get heard.
Sources: @ClaudeDevs announcement, Anthropic engineering post-mortem, AMD report analysis, The Register.

