Claude Code v2.1.116: Anthropic's Post-Mortem on Three Bugs That Slipped Past a Month of Complaints

TL;DR

Anthropic just admitted that Claude Code's quality slipped over the past month and published a post-mortem on three distinct issues. All are fixed in v2.1.116+, and usage limits have been reset for every subscriber as compensation. The confession arrives two weeks after AMD AI Director Stella Laurenzo dropped a 6,852-session dataset on GitHub showing Claude Code's "reads-per-edit" had collapsed 70% — the kind of evidence you cannot wave away.

What's new

The Claude Code team's post-mortem identifies three root causes that compounded into what users experienced as "Claude getting dumber":

Prompt cache bugs — two separate defects quietly inflating token costs 10–20x.
Zero-thinking-tokens bug — Extended Thinking set to high could return zero reasoning tokens while still billing the input context.
Effort-default and peak-hour throttling — the default effort level was silently lowered and weekday morning throttling was introduced without notice.

Claude Code v2.1.116+ ships fixes for all three. Anthropic is also resetting usage limits across Free, Pro, Max, Team, and Enterprise — a goodwill gesture aimed at customers whose sessions burned through quota unfairly during the regression window.

Why it matters

This is the second time in eight months Anthropic has had to explain a quality regression publicly. The September 2025 postmortem covered inference-layer bugs (routing, TPU compiler, approximate top-k). This one is different — it sits in the Claude Code client and in product defaults that Anthropic chose, not in the serving stack. That distinction matters: the first was an accident, this one is a mix of bugs and intentional product decisions that users experienced as degradation.

It also sets a precedent: a single, rigorous user-built measurement suite — Laurenzo's — was enough to force a vendor with billions in funding to issue a post-mortem and write checks in the form of reset limits. Expect more of these audits in 2026.

Technical facts

The four ingredients of the $42,121 bill (up from $345/month) that became the viral poster child for this incident:

Factor	Cost impact	Status in v2.1.116+
Session-resume cache miss (`--resume`/`--continue` re-ingested full context)	10–20x	Fixed
Bun fork string-replace bug in `npx @anthropic-ai/claude-code`	10–20x	Fixed
Zero-thinking-tokens on Extended Thinking high	Paid for depth, got shallow	Fixed
Default effort dropped to `medium` (effort=85) + peak-hour throttling (5–11am PT weekdays)	Session quota burn + shallower reasoning	Default restored to high; peak-hour throttling adjusted

Some sampling-level fixes from the September 2025 incident — exact top-k over approximate, more operations standardized on fp32 — are already live and stay that way. Quality, as Anthropic put it then, "is non-negotiable."

The AMD report that forced the issue

Stella Laurenzo, Senior Director at AMD's AI Group, instrumented her team's IREE compiler workflows and compared two identical 14-day windows:

Reads per edit: 6.6x (Jan 30 – Feb 12) → 2.0x (Mar 8 – Mar 23). A 70% drop. Claude stopped reading related files before editing.
Blind edits (edits to files Claude never read): 6.2% → 33.7%. One in three edits.
Ownership-dodging stop hook: 0 fires in the good period, 173 fires in 17 days of the degraded period.
API cost: $345/month → $42,121 (compound of intentional agent scaling from 1–3 to 5–10 concurrent, cache bugs, and retries).

The strongest signal wasn't vibes — it was behavioral. Tool-call traces showed Claude shifted from research-first to edit-first starting February 9, the day adaptive thinking was deployed as default on Opus 4.6.

Comparison with the September 2025 postmortem

Dimension	Sept 2025	April 2026 (v2.1.116+)
Layer	Inference infra	Claude Code client + product defaults
Peak impact	16% of Sonnet 4 traffic at worst hour	~7% users during peak throttling + all `npx` users on cache bug
Trigger	Routing + TPU + XLA compiler	Cache + thinking-token + effort defaults
Detection	Anthropic's internal investigation	External data audit (AMD, 6,852 sessions)
Compensation	None	Usage limits reset for all subscribers

Who benefits most

If you fit any of these, upgrade to v2.1.116+ immediately:

Heavy Claude Code users on Pro who felt session quotas evaporate on weekday mornings.
Long agentic coding sessions — cache fixes alone should cut token spend 10–20x for session-resume workflows.
Extended Thinking power users — the zero-token bug was silently charging for depth you never got.
Anyone running via npx — the Bun string-replace bug is finally dead. Standalone claude binary recommended regardless.

Limitations & what's still open

Laurenzo's team reports the behavioral degradation — reads-per-edit, blind edits — persisted even after all published workarounds were applied. That suggests v2.1.116+ addresses the bill-shock and throttling issues cleanly, but the deeper "adaptive thinking" behavioral change (Feb 9) remains a product design question, not a bug. Boris Cherny, Claude Code's creator, confirmed on GitHub that adaptive thinking is intentional. If you run complex engineering work, set effort explicitly rather than trusting the default.

Workarounds still worth pinning to your shell config:

CLAUDE_CODE_EFFORT_LEVEL=max
CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

Also: avoid --resume and --continue for unrelated tasks, use /clear between contexts, and run heavy work off-peak.

What's next

Anthropic's commitments from the September 2025 post-mortem still apply and are doubling down:

Continuous quality evals on true production traffic, not just synthetic benchmarks.
Evaluations sensitive enough to separate "working" from "broken" — not just move an average.
Privacy-aware debugging tooling so engineers can triage community reports without reading raw user data.
Tighter communication loop with the Claude Code GitHub community.

The deeper lesson is architectural. Laurenzo's team had an entire compiler workflow running through a single model with no fallback. When that model's behavior shifted — bugs or product decisions, doesn't matter — their system broke. Build your own 50-case eval suite. Abstract over providers. Measure before you complain, and you will get heard.

Sources: @ClaudeDevs announcement, Anthropic engineering post-mortem, AMD report analysis, The Register.

Claude Code v2.1.116: Anthropic's Post-Mortem on Three Bugs That Slipped Past a Month of Complaints

TL;DR

What's new

Why it matters

Technical facts

The AMD report that forced the issue

Comparison with the September 2025 postmortem

Who benefits most

Limitations & what's still open

What's next

Tiếp tục lướt

Gitnuro: Git client multiplatform viết bằng Compose + JGit, FOSS thuần, không Electron

termDRAW: vẽ sơ đồ ASCII ngay trong terminal để prompt agent đỡ tốn token

chartli: Pipe Numbers, Get Charts — A Zero-Setup Terminal Charting CLI

Orca IDE v1.3.18: Bình luận trực tiếp lên diff, gửi cả review cho AI agent trong một click

GitHub Copilot SDK gặp React Native: bài học từ IssueCrush