TL;DR

  • Khi trien khai LLM cho nguoi dung thuc, nhin vao input va output cua ca he thong la chua du - ban can thay duoc moi buoc ben trong.
  • Trace la container ghi lai toan bo hanh trinh cua mot request tu dau den cuoi, lien ket boi mot Trace ID duy nhat.
  • Span la tung operation rieng le ben trong trace - embedding, retrieval, context assembly, generation - moi span co time, latency, va metadata rieng.
  • Khong co span-level tracing, ban chi biet "response sai" nhung khong biet do retrieval xau, context qua dai, hay LLM hallucinate.
  • 94% team co AI agent trong production dang dung observability. Team chua dung thuong van dang debug su co cua tuan truoc.

Layers of Observability in AI Systems - diagram cho thay Trace va Spans trong RAG pipeline

Van de: AI pipeline la hop den

He thong cua ban xanh het. Khong co alert, khong co error. Nhung AI da dang tra loi sai cho nguoi dung suot hai tuan qua.

Day la diem khac biet co ban: mot button bi hong tra ve HTTP 500 - APM truyen thong bat duoc ngay. Mot LLM dang hallucinate tra ve HTTP 200 OK. Traditional monitoring chap nhan no la "thanh cong".

Van de nam o cho ban chi nhin vao input va output cua ca pipeline. Ben trong la mot chuoi buoc - query processing, embedding, vector search, context assembly, LLM call - moi buoc co the fail theo cach rieng cua no. Neu mot buoc lam chay hay tao ra output toi, tat ca cac buoc sau bi anh huong theo, nhung ban khong biet bat dau debug tu dau.

Day la ly do traces va spans ton tai.

Traces va Spans la gi

Hay nghi RAG pipeline nhu mot chuoi cac buoc. Nguoi dung dat cau hoi, no chay qua nhieu component, va cuoi cung response duoc tra ve. Moi buoc ton thoi gian, moi buoc co the fail, moi buoc co chi phi rieng.

Trace la container bao trum toan bo hanh trinh do. Khi nguoi dung gui mot query, mot Trace ID duy nhat duoc tao ra. Moi operation xay ra trong qua trinh xu ly query do deu mang cung Trace ID nay - du chung xay ra o cac service khac nhau hay thoi diem khac nhau. Neu he thong xu ly 1.000 query, ban co 1.000 traces.

Span la tung operation rieng le ben trong trace do. Khac voi log chi ghi lai mot thoi diem, span ghi lai mot khoang thoi gian: thoi diem bat dau, thoi diem ket thuc, duration, status (thanh cong hay loi), va cac attributes - metadata mo ta chi tiet dieu gi da xay ra. Span co the long nhau thanh cay phan cap: root span bao trum toan bo request, cac child span xu ly tung sub-operation.

Giai phau tung Span trong RAG

Moi span trong pipeline RAG tieu bieu bat lay mot loai van de khac nhau:

  • Query Span: Nguoi dung gui cau hoi. Day la diem bat dau cua trace. Ban ghi lai raw input, timestamp, va session info. Neu co van de o day - input bi truncate, session bi mat - ban se biet ngay tu span dau tien.
  • Embedding Span: Query duoc dua vao embedding model va bien thanh vector. Span nay theo doi token count va latency. Neu embedding API cham hoac dang bi rate limit, ban bat duoc o day - truoc khi no anh huong den ket qua retrieval.
  • Retrieval Span: Vector di vao database de similarity search. Day la noi an nhieu bug nhat trong RAG - bad chunks, diem relevance thap, gia tri top-k sai, vector DB qua tai. Mot nghien cuu hoc thuat 2025 xac nhan: retrieval failure la mot trong hai nguon chinh gay ra hallucination trong RAG. Span nay lo ra tat ca.
  • Context Span: Cac chunk tim duoc duoc ghep voi system prompt. Span nay cho thay chinh xac nhung gi se duoc nap vao LLM. Neu context qua dai (vuot max tokens), ban nhin thay o day - truoc khi LLM nhan duoc mot prompt bi cat phan quan trong.
  • Generation Span: LLM tao ra response. Day thuong la span dai nhat va dat nhat. Input tokens, output tokens, latency, reasoning tokens (neu co) - tat ca deu duoc log de theo doi chi phi va debug hallucination.

Ba ly do ban can lam dieu nay ngay bay gio

1. Debug toc do anh sang. Truoc khi co observability, mot response sai co nghia la chay lai pipeline thu cong, tinh chinh prompt va hy vong van de tai hien. Voi trace day du, ban mo trace do len, doc span bi loi, va sua - khong can doan mo. Span retrieval cho thay doc khong lien quan? Van de o chunking strategy. Generation span cho thay context hop ly nhung LLM van tra loi sai? Van de o model. Hai truong hop, hai huong sua khac nhau hoan toan.

2. Biet tien chay vao dau. LLM tinh phi theo moi API call dua tren token. Neu ban chi nhin hoa don cloud cuoi thang, ban khong biet feature nao, user nao, hay prompt version nao dang dot ngan sach. Span-level tracking cho thay chinh xac generation span cua tinh nang nao ton khi nhat - de toi uu truoc khi chi phi leo thang.

3. Bat drift truoc khi nguoi dung phan nan. AI he thong xuat cap theo thoi gian. Thu hoat dong tot thang truoc co the khong con tot hom nay - embedding model update, data distribution thay doi, knowledge base cu di. Span-level metrics cho phep ban thiet lap threshold tren hallucination rate, retrieval relevance score, va latency - va nhan canh bao khi chung xau di, thay vi phat hien qua support ticket.

Cong cu de bat dau

Tieu chuan ky thuat la OpenTelemetry (OTel) - mien phi, open-source, va vendor-neutral. Ban viet instrumentation code mot lan, du lieu chay duoc toi bat ky backend nao. Namespace gen_ai dang tro thanh chuan cho LLM attributes (gen_ai.usage.prompt_tokens, gen_ai.request.model...).

Phia tren OTel la cac platform chuyen biet:

  • LangWatch, Langfuse, Agenta - open-source, co free tier, phu hop cho team nho den vua
  • Arize Phoenix - manh ve drift detection va RAG monitoring
  • Traceloop / OpenLLMetry - auto-instrument LangChain, LlamaIndex voi it code nhat
  • DeepEval - neu uu tien CI/CD testing hon production monitoring

Nguyen tac: bat dau bang LLM call chinh (prompt, response, latency, tokens), them retrieval span ngay sau, do baseline truoc khi su co xay ra.

Tiep theo: Observability se tro thanh bat buoc

Gartner du bao LLM observability se dat 50% trong tat ca GenAI deployments vao 2028 - tu muc 15% hien tai. OpenTelemetry dang chinh thuc mo rong vao AI Agent Observability voi cac span kind chuyen biet (agent, workflow, tool). Ben canh do, Voice Observability dang noi len nhu mot linh vuc rieng de bat cac loi lop audio ma text traces khong the nhin thay.

Nhung dieu khong thay doi: neu khong co visibility vao tung buoc trong pipeline, ban dang blind-fly trong production. Span-level tracing la dieu kien co ban de van hanh AI system tin cay.

Nguon: @_avichawla, Traceloop, Agenta, Cekura.