WinDbg Gặp LLM: Extension Mới Biến Disassembly Thành Pseudocode Được Verify Ngay Trong Debugger

TL;DR

kernullist vừa công bố windbg-decompile-ext (tên project: WindbgLlmDecomp) — một WinDbg extension x64 viết bằng C++, MIT license, gọi LLM in-process từ debugger để biến live disassembly thành pseudocode. Điểm khác biệt so với các tool AI-debugger khác: có verification pass cross-check output LLM với dữ liệu từ analyzer gốc (call graph, unwind metadata, branch structure), và deterministic mock fallback khi không có endpoint. Chạy lệnh đơn giản: !decomp module!FunctionName.

What's new

Hầu hết các tích hợp LLM + WinDbg gần đây (VibeDbg, WinDbg-ext-MCP của NadavLor) đi theo hướng bắc cầu — spawn một sidecar Python/MCP, forward lệnh ra Cursor hoặc Claude bên ngoài. Cách đó linh hoạt nhưng tăng surface: IPC, auth, drift giữa state debugger và state chat.

windbg-decompile-ext đi hướng ngược lại: nhét luôn HTTP client OpenAI-compatible vào trong extension DLL. Khi bạn gõ !decomp, DbgEng chạy phân tích deterministic → DLL đóng gói prompt → gọi thẳng endpoint (cloud hoặc local Ollama/LM Studio/vLLM) → verify output → in ra WinDbg console. Không sidecar, không MCP, không context drift.

Why it matters

LLM decompilation chính xác đến đâu? Câu trả lời ngắn: không đủ để tin mù. Pseudocode do LLM sinh ra có thể trông đúng nhưng bịa ra branch, bỏ sót early return, hoặc dịch sai convention gọi hàm. Trong reverse engineering — đặc biệt là triage malware hoặc debug kernel driver — một hallucinated if-else có thể tốn hàng giờ lạc hướng.

Điểm hay nhất của extension này là nó coi LLM như một candidate, không phải ground truth. Analyzer deterministic chạy trước và giữ fact; LLM viết pseudocode đẹp; rồi một pass verification riêng kiểm tra pseudocode có mâu thuẫn với fact gốc không. Đó mới là ý nghĩa thực sự của chữ "verified" trong tiêu đề.

Technical facts

Live memory analysis qua DbgEng cho x64: symbol-region recovery, parse unwind metadata, heuristic function-range identification.
In-process HTTP client: endpoint OpenAI-compatible nhúng thẳng trong DLL — không sidecar, không IPC.
Deterministic mock fallback: khi không config endpoint, extension vẫn sinh pseudocode từ riêng analyzer. Tiện cho offline reversing và CI.
Chunked multi-pass analysis cho hàm lớn:
- Trigger khi hàm vượt ~512 instructions hoặc ~24 basic blocks.
- Mỗi chunk tối đa 14 blocks, hard cap 20 chunks/hàm.
- Token budget: chunk_completion_tokens=3500, merge_completion_tokens=9000.
Verification pass cross-check pseudocode LLM với call graph, branch structure, unwind info từ analyzer — flag drift thay vì im lặng accept.
Config: decomp.llm.json với 17+ tham số, tất cả override được bằng env var. timeout_ms=120000 khuyến nghị cho cloud model.

Commands

Sau khi .load decomp.dll trong WinDbg:

!decomp module!FunctionName
!decomp /deep module!LargeFunction
!decomp /huge module!VeryLargeFunction
!decomp /json 0x140123450
!decomp /no-llm game.exe!CheckIntegrity

Flag /no-llm đặc biệt hữu ích: chạy analyzer-only, không gọi endpoint, dùng để verify structural output hoặc xử lý binary nhạy cảm không muốn upload lên cloud.

Comparison

Tool	Integration	Vai trò LLM	Verification
windbg-decompile-ext	In-process DLL	Pseudocode candidate	Cross-check analyzer facts
VibeDbg	External AI assistant	Chat/NL interface	Không
WinDbg-ext-MCP	MCP sidecar (Python)	External client điều khiển	Không
LLM4Decompile	Standalone model	Ground-truth generator	Không (offline)
Hex-Rays / IDA	Commercial	Không dùng LLM	SSA-based decomp

Use cases

Kernel & driver RE: live session trên driver crash, !decomp nt!SomeFn ra pseudocode đọc được mà không phải switch sang IDA.
Malware triage: dưới breakpoint, lấy pseudocode nhanh cho routine obfuscated, có verification flag branch ảo.
Anti-cheat / integrity check: dùng /no-llm để lấy structural summary mà không upload binary lên cloud.
CI offline: mock fallback cho test suite chạy deterministic.
Local-model RE: trỏ endpoint về Ollama / LM Studio / vLLM, giữ binary nhạy cảm ở trong mạng nội bộ.

Limitations & pricing

x64 Windows-only — chưa hỗ trợ x86, ARM64, hay lldb/gdb.
Pre-1.0: mới 4 stars, 2 forks, 2 commits trên main, chưa có release tag.
Build cần Visual Studio Dev env + CMake + Windows Debuggers SDK (dbgeng.h, dbgeng.lib).
Hàm cực lớn vẫn có thể overflow kể cả khi chạy 20 chunks × 3500 tokens; /huge tăng giới hạn nhưng latency tăng theo.
Miễn phí (MIT) — nhưng LLM cost do user trả, không có hosted service, không telemetry.

What's next

Cấu trúc repo có src/extension/ tách bạch khỏi src/shared/ — gợi ý tác giả tính tái sử dụng analyzer + verification layer ra ngoài WinDbg (CLI tool? port lldb?). Chưa có roadmap chính thức, nhưng pattern "analyzer facts + LLM candidate + verification pass" đáng để các tool RE khác học theo: thay vì tin LLM, dùng LLM như generator và giữ ground truth ở chỗ deterministic.

Nguồn: kernullist/windbg-decompile-ext, thông báo trên X.

WinDbg Gặp LLM: Extension Mới Biến Disassembly Thành Pseudocode Được Verify Ngay Trong Debugger

TL;DR

What's new

Why it matters

Technical facts

Commands

Comparison

Use cases

Limitations & pricing

What's next

Tiếp tục lướt

Bên trong rdmsr: researcher đang mổ xẻ microcode Cannon Lake và tìm thấy CREGPLA

Hermes Agent v0.11.0: Nous Research ships biggest update yet with 761 PRs, TUI v2, and QQBot

Hermes Agent v0.11.0: Bản cập nhật lớn nhất với 761 PR, TUI React/Ink mới và 17 messaging platform

Reverse API Engineer: biến traffic trình duyệt thành Python API client chỉ trong vài click

Exa Highlights: cắt 96% input token cho web agent, 500 token đủ thay 10K token nguyên trang