Future AGI mở source toàn bộ nền tảng agent tự cải thiện: simulate, eval, guardrail, gateway, optimize trong 1 platform

TL;DR

Future AGI, Inc. vừa open-source (Apache 2.0, self-hostable) toàn bộ nền tảng engineering cho AI agent. Thay vì stitch Langfuse + Braintrust + Helicone + Guardrails AI, giờ chỉ 1 platform bao 6 trụ: Simulate → Evaluate → Protect → Monitor → Gateway → Optimize, data chạy thành 1 feedback loop để agent tự cải thiện từ production traces.

Con số đáng để ý: gateway Go-based ~29k req/s với P99 ≤ 21ms (guardrails on), 72+ eval metrics chạy local zero-network, 18 guardrail scanners + 15 vendor adapter (Lakera, Presidio, Llama Guard), OpenTelemetry tracing cho 50+ framework (LangChain, CrewAI, LlamaIndex, DSPy…), và 6 thuật toán tối ưu prompt gồm GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, Random Search. Pro plan $50/month flat — không tính per-seat.

Cái gì mới

Trước đây Future AGI có SaaS cloud + SDK eval. Đợt này họ release full codebase platform dưới Apache 2.0: repo future-agi/future-agi đã public với docker-compose stack, Python/TypeScript SDK, Go gateway. Core README ghi thẳng: "Nightly release for early testing. Expect rough edges. Stable coming soon."

SDK ai-evaluation vừa bump Python 1.1.0 / TypeScript 0.2.0 kèm cookbooks runnable. Ecosystem split thành nhiều repo độc lập (mỗi SDK Apache/MIT tự đứng): traceAI cho OTel instrumentation, ai-evaluation cho 72+ metric, agent-opt cho 6 thuật toán tối ưu, simulate-sdk cho voice agent qua LiveKit + Silero VAD, agentcc là client gateway.

Tại sao việc này quan trọng

LLM agent hay "pass hết eval trong dev, rồi hallucinate chính sách refund không tồn tại trong production". Team hiện stitch nhiều vendor — eval ở chỗ này, tracing chỗ khác, guardrail chỗ khác nữa — và feedback loop giữa chúng không bao giờ đóng. Future AGI đánh đúng vào khe hở đó: mọi trace prod quay lại làm training data cho prompt optimizer, mọi vi phạm guardrail thành case simulation cho release kế tiếp. Đó là lý do họ dùng từ self-improving thay vì observability.

Thêm nữa: toàn bộ interface mở — OpenTelemetry OTLP cho trace, OpenAI-compatible HTTP cho gateway, Postgres/ClickHouse SQL cho storage. Team nào đã có phần nào trong stack thì drop-in thay thế từng layer được.

Số liệu kỹ thuật

Thành phần	Chỉ số
Gateway throughput	~29,000 req/s trên t3.xlarge
Gateway P99 latency (guardrails on)	≤ 21 ms
Weighted routing	~9.9 ns
LLM providers	100+ (OpenAI, Anthropic, Gemini, Bedrock, Mistral, Groq, xAI, self-hosted Ollama/vLLM…)
Routing strategies	15 (load balance, semantic cache, virtual keys, MCP, A2A)
Local eval metrics	72+ (23 string, 14 JSON, 5 hallucination, 19 RAG, 11 agent/function)
Cloud eval templates	100+
Guardrail scanners	18 built-in + 15 vendor adapters + 14 guard models
Guardrail latency	< 10 ms (sub-100ms end-to-end)
OTel framework instrumentors	50+ (LangChain, LangGraph, LlamaIndex, CrewAI, DSPy, AutoGen, PydanticAI, Claude SDK, LiteLLM, Haystack, Instructor, Smol-agents)
Prompt optimization algorithms	6 (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, Random Search)
Vector DBs	6 (Pinecone, Weaviate, Chroma, Milvus, Qdrant, pgvector)
Voice platforms	4 (VAPI, Retell, LiveKit, Pipecat)

GEPA (Genetic Pareto) là thuật toán đáng chú ý — evolutionary, được accept ở ICLR 2026 — evolve prompt qua generations bằng reflection + mutation. ProTeGi thì apply "textual gradients": sinh critique từ failure, patch prompt dựa trên critique.

So với Langfuse, Braintrust, Helicone, LangSmith

Capability	Future AGI	Langfuse	Phoenix	Braintrust	Helicone
Open source	✅	✅	✅	❌	✅
Self-host	✅	✅	✅	❌	✅
Agent simulation	✅	❌	❌	❌	❌
Voice agent eval	✅	❌	⚠	❌	❌
LLM gateway built-in	✅	❌	❌	✅	✅
Guardrails built-in	✅	❌	❌	❌	❌
Prompt optimization	✅	❌	❌	❌	❌

Về pricing, team 10 người dùng LangSmith trả $390/tháng ($39/user), Braintrust $249/tháng, Arize custom enterprise. Future AGI Pro flat $50/tháng — và Startup tier có $10K credits + 6 tháng Pro free.

Use case thực tế

Customer support: simulate hàng nghìn kịch bản refund/escalation trước launch. Khi bot draft "We offer full refund within 90 days, no questions asked", guardrail block và auto-correct về "30 days" theo policy PDF §3.1.

RAG pipelines: 42k queries, 99.1% grounded. Stress-test bằng adversarial & multi-hop, verify từng citation với source doc; fabricated claim bị remove trước khi trả. Case study: retrieval recall +8%, hallucination rate -67%.

Voice agents: đánh giá STT / LLM / TTS độc lập. Intercept SSN read-aloud trước synthesis (8ms), detect escalation tone và reroute về human. TTS name accuracy +9%, p99 latency -63% sau auto-optimization.

Code agents / PR review: AST-based vulnerability detection (15 detectors, multi-language), block SQL injection + hardcoded JWT secret trước merge. 0 CVEs qua 8.4k PR.

Internal copilots: test role-bypass (sales rep hỏi internal margin của Acme Corp → block + log). 24k actions, 0 data leak, 3 teams cover.

Customers đã deploy: Whatfix (500+ enterprise teams), Ottimate (10M+ API calls/day), Milestone Internet, Micron Technology.

Limitations & pricing

Cẩn trọng: stack đang ở nightly, stable version "coming soon" theo README. Helm chart v1 vẫn đang làm — hiện tại Kubernetes phải deploy bằng plain manifests trong deploy/. AWS Marketplace listing: "coming soon". Một số UI preview (agent simulation dashboard, optimize dashboard) trên landing page đang để note "coming soon".

Pricing hiện tại: Free cho small team, Startups $10K credits + 6 tháng Pro, Pro $50/tháng flat, Enterprise custom SLA. Compliance: SOC 2 Type II, GDPR, HIPAA, ISO 27001; zero data retention cho self-host; air-gapped / on-prem deploy available (không phone-home).

Sắp tới

Roadmap công khai trong README. Đáng chú ý:

Agent Changelog & Diff view — diff agent giữa version
Full Execution Tracing cho autonomous agent
Multi-modal agent support
Simulate CUA (Computer-Use Agents) và coding agents
Scheduled Simulations — chạy regression theo cron
Native CI/CD plugin (Jenkins, GitLab CI, CircleCI)
Session-level multi-turn tracing
Evaluation marketplace — community contribute metric
Fine-tuned judge models từ feedback data tích luỹ
On-premise deployment toolkit hoàn chỉnh

Repo chính: future-agi/future-agi (249⭐ tại thời điểm xem), ai-evaluation (93⭐), futureagi-sdk (44⭐). Team active trên Discord + GitHub Discussions.

Nguồn: future-agi/future-agi, futureagi.com, ai-evaluation SDK, futureagi-sdk, @hasantoxr breakdown.

Future AGI mở source toàn bộ nền tảng agent tự cải thiện: simulate, eval, guardrail, gateway, optimize trong 1 platform

TL;DR

Cái gì mới

Tại sao việc này quan trọng

Số liệu kỹ thuật

So với Langfuse, Braintrust, Helicone, LangSmith

Use case thực tế

Limitations & pricing

Sắp tới

Bài liên quan

Hermes TUI HUD: keyboard-first operator console cho Hermes Agent

SuperLevels: Pieter Levels gộp 14 extension Chrome thành 1 file MIT, ai cũng audit được trước khi cài

Hermes Video Agent: pipeline tự động "1 URL vào — clip dịch & đăng X ra", vừa open-source MIT