FutureAGI open-source toàn bộ nền tảng AI agent: simulate, eval, guardrail, gateway trong một feedback loop

TL;DR

FutureAGI vừa open-source toàn bộ nền tảng AI agent của họ dưới giấy phép Apache 2.0 - không phải bản rút gọn. Cả UI, backend Django, gateway Go, simulation engine, evals, optimization loop, observability OTel, guardrails, docs đều nằm chung một repo. Điểm đáng chú ý không phải ở độ rộng, mà ở kiến trúc: nó gộp các mảnh rời rạc của agent reliability (tracing một tool, eval tool khác, guardrail tool khác nữa) vào một feedback loop đóng kín - simulate → eval → detect → learn → fix → validate → redeploy → monitor — chạy lại mỗi khi có sự cố mới, không cần keo dán tay.

What's new

Trước đây FutureAGI đã có từng SDK riêng - ai-evaluation, futureagi-sdk, agent-opt - sống rải rác trên nhiều repo. Bản release này gộp tất cả lại sau một UI thống nhất + một feedback loop, tagline "AI Agents hallucinate. Fix it faster."

Monorepo chứa đủ 6 trụ cột:

Simulate - hàng nghìn hội thoại đa lượt, persona thực tế, adversarial input (text + voice).
Evaluate - 50+ metrics trong một evaluate(): groundedness, hallucination, tool-use correctness, PII, tone, rubric tùy biến.
Protect - 18 scanner sẵn (PII, jailbreak, prompt injection…) + 15 adapter vendor (Lakera, Presidio, Llama Guard…).
Monitor - tracing OpenTelemetry-native cross 50+ framework (LangChain, LlamaIndex, CrewAI, DSPy…).
Command Center / Gateway - OpenAI-compatible, 100+ provider, 15 chiến lược routing, semantic caching, virtual keys.
Optimize - 6 thuật toán prompt optimization bundled (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, Random).

Why it matters

Hầu hết stack "agent reliability" hiện tại bị phân mảnh: tracing nằm ở Langfuse/LangSmith, evals ở Ragas/DeepEval, guardrails ở NeMo hoặc Guardrails AI. Đội ops phải tự kéo dây - log chỗ này, correlate chỗ kia, và agent không thực sự cải thiện: bạn chỉ vá prompt rồi hy vọng lần sau đỡ hơn.

FutureAGI lật mô hình đó: mỗi lần agent fail, hệ thống tự sinh fix, tự validate fix với traffic thật, tự check regression, rồi redeploy. Khi có failure mới, loop chạy lại. Lớp "optimization loop" chính là layer mà các công cụ hiện tại đang thiếu - chúng nói cho bạn biết cái gì gãy, nhưng không đóng vòng lại để sửa.

Technical facts

Stack bên dưới không phải đồ chơi research:

Layer	Tech
Backend	Python 3.11+ / Django 4.2 + Channels
Gateway	Go 1.23+
Frontend	React 18 + Vite
Data	PostgreSQL (metadata) · ClickHouse (spans + time-series) · Redis (state)
Jobs	RabbitMQ + Temporal

Số liệu hiệu năng đáng chú ý (trên t3.xlarge, theo benchmark harness đã commit trong repo):

Gateway: ~29k req/s, P99 ≤ 21ms - có guardrails bật.
Weighted routing: ~9.9 ns.
Evals: sub-100ms across modalities - dùng trained classifiers, không phải LLM-as-judge (nhanh hơn và deterministic hơn).
Deploy: git clone + docker compose up -d → mở tại http://localhost:3031. Kubernetes manifest có sẵn.

Comparison

So với các lựa chọn closed-SaaS hiện tại, FutureAGI là số ít nền tảng bundle đủ 6 trụ cột và fully self-hostable:

Tool	Trace	Eval	Simulate	Guardrail	Optimize loop	Self-host OSS
FutureAGI	✓	✓	✓	✓	✓	✓ (Apache 2.0)
LangSmith	✓	✓	—	—	—	—
Langfuse	✓	✓	—	—	—	✓
Arize / Galileo	✓	✓	partial	partial	—	—
Guardrails AI	—	—	—	✓	—	✓

Use cases

Số liệu trên homepage cho thấy nền tảng đã chạy với nhiều dạng agent khác nhau:

Voice agent customer-support - vòng lặp sim/eval kiểm soát hallucination real-time.
Computer-use UI agent - 6.2k session, 99.4% safe rate.
Coding agent - 8.4k PR shipped, 0 CVE.
RAG / search pipeline - 42k query, 99.1% grounded.
Autonomous multi-step agent và internal copilot - tracing step-level (reasoning, cost, latency, quality).

Phần simulation là nổi bật nhất với đội chạy production: thay vì test case tĩnh, engine sinh adversarial multi-turn conversation dựa trên behavior thực của agent — săn đúng các kịch bản mà hệ thống "fail confidently". Vài nghìn sim là ra được những failure mode mà QA tay bỏ sót.

Limitations & pricing

License: Apache 2.0 core - tự host, inspect, modify, ship commercial. Mỗi SDK đóng gói độc lập (Apache/MIT).
Managed cloud: có tier start-for-free tại app.futureagi.com; chưa công bố bảng giá paid tier chi tiết.
Infra footprint: full stack cần PostgreSQL + ClickHouse + Redis + RabbitMQ + Temporal - nặng hơn một eval SDK đơn lẻ. Docker compose xử lý local ổn; production self-host cần kỷ luật ops.
Cộng đồng: ~258 star tại thời điểm viết - còn sớm so với các closed-SaaS đã cắm rễ. Số lượng integration vẫn đang mở rộng.
Không phải training framework: đây là eval/ops/optimization layer - bạn vẫn cần cắm model provider riêng qua gateway.

What's next

Repo vừa release được đánh tag "nightly release for early testing" - stable version đang được build. Roadmap công khai có thêm: voice-agent testing mở rộng, thêm thuật toán optimization, OTel integration sâu hơn, và thêm vendor guardrail adapter.

Nếu bạn đang duct-tape hạ tầng quanh AI agent, đây là một trong những candidate gần nhất với "unified system" thực sự. Thử cloud miễn phí trước để xem loop, rồi self-host khi workload đủ nặng.

Via: github.com/future-agi/future-agi, futureagi.com, docs.futureagi.com.