Kimi K2.6 lên Perplexity Pro & Max: open-weight SOTA giờ chỉ cách bạn 1 toggle

Kimi K2.6 announcement banner from Moonshot

TL;DR

Kimi K2.6 ra mắt 20/04/2026: MoE 1T tổng / 32B active, context 256K, multimodal native (text + image + video input).
Đứng #1 open-weight trên Artificial Analysis Intelligence Index (54), bám sát closed-source frontier (Anthropic / Google / OpenAI cùng 57).
SWE-Bench Pro 58.6 vượt GPT-5.4 (57.7) và Opus 4.6 (53.4). HLE-Full với tools 54.0 dẫn đầu cả nhóm.
Agent swarm 300 sub-agent × 4,000 step, demo coding liên tục 13 giờ.
Perplexity vừa ship K2.6 cho Pro & Max — chọn từ model picker, không cần API key, không cần host vLLM.

What's new

Ngày 20/04/2026, Moonshot AI công bố Kimi K2.6 trên blog chính thức và Hugging Face dưới Modified MIT License. Đây là bản nâng cấp lớn của dòng K2: vẫn là Mixture-of-Experts 1T params nhưng số expert nâng lên 384 (8 routed + 1 shared), dùng MLA attention, hỗ trợ INT4 để chạy được trên hạ tầng nhỏ hơn.

Day-0 đã có mặt trên vLLM, OpenRouter, Cloudflare Workers AI, Baseten, Fireworks, Novita, Parasail, MLX và OpenCode. Ba ngày sau, Perplexity tweet xác nhận K2.6 mở cho subscriber Pro và Max — đây là cách dễ nhất để dùng thử mà không phải đụng API.

Why it matters

Lần đầu tiên một open-weight model bám sát top 3 closed (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) trên benchmark tổng hợp và vượt mặt ở những bài coding/agentic quan trọng. Nghĩa là indie dev và team nhỏ có lựa chọn không lock-in: muốn rẻ thì tự host, muốn tiện thì bật trên Perplexity, muốn enterprise thì on-prem — cùng một model, cùng chất lượng.

Quan trọng không kém: tỷ lệ ảo giác giảm còn 39% (so với 65% ở K2.5). Vẫn cao hơn Claude Sonnet 4.6 (36%) nhưng đã đủ ngưỡng để giao việc dài hạn cho agent mà không cần babysitter.

Technical facts

Kimi K2.6 vs GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro benchmark comparison

Property	Kimi K2.6
Tổng params	1T (MoE)
Active params/token	32B
Experts	384 (8 routed + 1 shared)
Attention	MLA
Context window	256K tokens
Multimodal input	Text + Image + Video
Quantization	INT4 supported
Modes	Thinking + Non-thinking, Dialogue + Agent
License	Modified MIT

Comparison

Benchmark	K2.6	GPT-5.4	Opus 4.6	Gemini 3.1 Pro	K2.5
SWE-Bench Pro	58.6	57.7	53.4	54.2	50.7
SWE-Bench Verified	80.2	—	—	—	—
Terminal-Bench 2.0	66.7	65.4	65.4	68.5	—
HLE-Full (with tools)	54.0	52.1	53.0	51.4	—
DeepSearchQA (F1)	92.5	78.6	—	—	—
τ²-Bench Telecom	96%	—	—	—	—

K2.6 dùng ~160M reasoning tokens để chạy hết Intelligence Index — tiết kiệm hơn Claude Sonnet 4.6 (~190M) nhưng tốn hơn GPT-5.4 (~110M). Trade-off chấp nhận được cho một model open-weight bằng giá host.

Reasoning token usage across frontier models

Use cases

Long-horizon coding: 4,000+ tool call, chạy liên tục 12+ giờ. Demo của Moonshot: agent tự tối ưu một financial engine trong 13 giờ, throughput tăng 185%.
Motion-rich frontend: sinh code WebGL, GSAP, Framer Motion, Three.js từ mô tả tự nhiên — không chỉ static UI.
Agent swarm 300 sub-agent: 100 sub-agent customize 100 CV song song; 300 sub-agent vừa research vừa viết báo cáo 40 trang với 20,000+ entry dataset.
Document → Skill: PDF, slide, spreadsheet được convert thành Skill tái sử dụng, giữ nguyên cấu trúc và style cho các autonomous run sau.
Deep research: 92.5 F1 trên DeepSearchQA — open-weight mạnh nhất hiện tại cho web research agent.

Limitations & pricing

Output text-only. Multimodal là input-only — không sinh ảnh/video.
Hallucination 39% — đã giảm mạnh nhưng vẫn cao hơn Claude Sonnet 4.6 (36%) và MiniMax-M2.7 (34%). Cần verifier khi giao việc critical.
Pricing Moonshot platform chưa công bố chi tiết theo $/1M token, có promotional recharge bonus trên platform.moonshot.ai. Trên Perplexity Pro & Max thì đã bao gồm trong subscription — không tính riêng.
License Modified MIT: dùng thương mại OK, cần attribution.

What's next

Tín hiệu roadmap: biến thể thinking-turbo để giảm chi phí reasoning, integration sâu hơn với Hermes Agent và OpenClaw, và áp lực buộc DeepSeek phải tung v4 sớm. Với Perplexity user, hành động đơn giản: mở model picker, chọn Kimi K2.6, giao một task coding dài hơi, xem nó tự xoay sở ra sao.

Nguồn: Moonshot blog, Artificial Analysis, Hugging Face, Perplexity.

Kimi K2.6 lên Perplexity Pro & Max: open-weight SOTA giờ chỉ cách bạn 1 toggle

TL;DR

What's new

Why it matters

Technical facts

Comparison

Use cases

Limitations & pricing

What's next

Tiếp tục lướt

Mind DeepResearch 30B của Li Auto vượt Gemini 3.1 trên benchmark deep research

AI Agent pops a root shell on Ubuntu 26.04 — on day one

OpenClaw v2026.4.24: Google Meet agents, full-agent voice, and DeepSeek V4 land in one release

CubeSandbox: Tencent vừa open-source nền tảng chạy hàng nghìn AI agent isolation thật trong vài mili-giây

GitHub Copilot SDK gặp React Native: bài học từ IssueCrush