Maximal Brain Damage: 2 Bit-Flips Can Wipe Out ResNet-50 and Qwen3-30B

TL;DR

A new paper from Ido Galil (NVIDIA), Moshe Kimhi (Technion, IBM Research) and Ran El-Yaniv (Technion, NVIDIA) shows that flipping just one or two sign bits in a deep network's weights is enough to catastrophically destroy it. ResNet-50 on ImageNet loses 99.8% accuracy from 2 flips. Qwen3-30B-A3B-Thinking drops from 78% math accuracy to 0% from 2 flips into two experts. Their method, Deep Neural Lesion (DNL), needs no training data and no gradient optimization at all.

What's new

Bit-flip attacks against neural networks aren't new — but every prior method (BFA, DeepHammer, ZeBRA) needed iterative gradient search and at least synthetic data. DNL is the first attack that is simultaneously data-agnostic AND optimization-free. It picks targets by a single magnitude-based heuristic (rank weights by |w|, flip the sign bit of the largest), then surgically targets early layers where damage compounds.

An enhanced variant, 1P-DNL, runs one forward+backward pass on random inputs to refine the ranking and amplifies damage further. The arXiv v2 (16 Apr 2026) extends the method beyond image classification to object detection, instance segmentation, and Mixture-of-Experts (MoE) reasoning LLMs.

Why it matters

If an attacker has any write primitive into stored model parameters — Rowhammer on shared DRAM, a kernel rootkit, a compromised GPU driver, firmware exploit, or DMA from a malicious peripheral — they can now bring down a production model with the smallest possible footprint. No training data exfiltration, no gradient computations to log, no observable inference traffic. Two bits. That's the entire payload.

For autonomous driving, medical imaging, content moderation, and on-prem LLM serving, this collapses the threat model assumption that attackers need either data access or sustained compute on the victim machine.

Technical facts

Model	Domain	Flips	Accuracy reduction
ResNet-50 (ImageNet)	Image classification	2	99.8%
MobileNet-V2	Image classification	2	99.9%
VGG-11	Image classification	2	99.8%
Inception-V3	Image classification	2	99.8%
ViT-B/16	Image classification	5	99.3%
Mask R-CNN (COCO, backbone only)	Object detection	1	box AP → ~0.01, mask AP → 0.00
YOLOv8-seg (COCO)	Detection + segmentation	1–2	>77%
Qwen3-30B-A3B-Thinking	MoE reasoning LLM	2	78% → 0% (math)

Why sign bits? In FP32, the sign is the most significant bit — easy to localize in memory. Flipping it instantly negates the weight, drastically warping the learned feature. Hardware attacks like Rowhammer also flip the same bit-position more reliably than arbitrary positions, so sign bits align naturally with what attackers can actually do.

Why early layers? Counterintuitively, early conv filters (the Sobel/Gabor-like edge detectors) are far more devastating to flip than late classifier layers. A corrupted edge filter cascades wrong signals through every downstream layer — the paper draws an explicit neuroscience analogy to optic-nerve lesions causing total blindness.

One bit per kernel. The authors prove that for CNNs, two flips inside the same kernel can partially cancel each other when nearby patch pixels are correlated. So DNL spreads flips across distinct kernels to maximize compounding damage.

Comparison vs prior bit-flip attacks

Method	Optimization-free	Data-free	Flips on ResNet-50	Accuracy drop
DeepHammer (2020)	No	No	23	75.4%
BFA (2019)	No	No	11	~99.7%
ZeBRA (2021)	No	Synthetic only	5	99.7%
DNL	Yes	Yes	8	99.7%
1P-DNL	Yes	Yes	1–2	99.4–99.8%

Computational complexity drops from O(k · B · θ · m) for prior gradient-based methods to just O(θ) + O(k) for DNL — essentially a single linear scan plus the flips themselves.

Use cases & who's most exposed

MoE language models (Qwen3-30B-A3B, Mixtral-style routing) are uniquely fragile — corrupting a single expert's down-projection poisons the latent token representation that propagates through attention even on tokens that don't route through that expert.
Autonomous driving perception stacks built on ResNet/MobileNet backbones: 2 bit flips in shared DRAM via Rowhammer is a realistic kill-switch.
Cloud GPU multi-tenant inference: GPU cache tampering vectors are less monitored than CPU caches and provide stealthy parameter corruption.
Defenders can use DNL itself as a targeting oracle — apply error-correcting codes (ECC) or bit-replication selectively to just the top ~1–20% most vulnerable sign bits and recover most of the robustness at a fraction of full-ECC memory cost.

Limitations & pricing

DNL assumes the attacker has global visibility into the full parameter set in order to rank weights by magnitude. If parameters are sharded across nodes, held in encrypted memory or a TEE, or only partially exposed to a compromised process, the global ranking can't be computed — and attack effectiveness drops accordingly.

This is academic research, not a product. Code is promised to be released publicly upon acceptance.

What's next

The authors flag three follow-up directions: partial-access threat models (sharded / TEE deployments), numeric formats and architectures resistant to sign-bit flips, and training procedures that flatten the magnitude distribution so no single weight is critical. The selective-ECC defense — protect only the bits DNL flags as critical — is already a deployable mitigation today.

Sources: arXiv 2502.07408, HTML preprint, HuggingPapers on X.

Maximal Brain Damage: 2 Bit-Flips Can Wipe Out ResNet-50 and Qwen3-30B

TL;DR

What's new

Why it matters

Technical facts

Comparison vs prior bit-flip attacks

Use cases & who's most exposed

Limitations & pricing

What's next

Tiếp tục lướt

Mind DeepResearch 30B của Li Auto vượt Gemini 3.1 trên benchmark deep research

AI Agent pops a root shell on Ubuntu 26.04 — on day one

claude-red: 38 Skill Tấn Công Biến Claude Thành Red Team Operator

Vibe Coding Pro Hack: 3 prompts Cursor review workflow bắt 90% lỗi trước khi giao client

RedAI: Terminal workbench biến AI security từ "đoán" thành "chứng minh"