Token Cost Attribution in Multi-Model LangChain Pipelines

TL;DR

LangChain has two built-in primitives for token attribution: UsageMetadataCallbackHandler (session roll-up) and get_usage_metadata_callback() (per-request guard). Pair them with LangGraph's Annotated[dict, operator.or_] state reducer and you get per-node, per-user, per-session token breakdowns — without a third-party observability SaaS. Lubu Labs documents the pattern after a real $1,400 → $4,200 cost incident traced to one misconfigured batch job.

What's new

The LangChain Community Spotlight just highlighted a guide by Simon Budziak, CTO at Lubu Labs, on attributing token cost across multi-model pipelines. The core APIs aren't brand new — they shipped in langchain-core 0.3.49 — but the guide is the first end-to-end recipe that ties callbacks, request metadata, and LangGraph state merging into one production attribution pattern.

Why it matters

Provider invoices aggregate by model only. A single user request hitting GPT-4o-mini, Claude Haiku, and GPT-4o leaves your finance team with three line items across two invoices and no join key. The article opens with a concrete failure mode: “Forecast: $1,400. Invoice: $4,200” — a 3× overage caused by one customer's misconfigured batch job that consumed 60% of the monthly token budget. The team burned three engineer-days correlating traces to find it. Per-node + per-tenant attribution turns that into a day-one alert.

Technical facts

UsageMetadataCallbackHandler — session aggregation

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler

callback = UsageMetadataCallbackHandler()
llm_1 = init_chat_model(model="openai:gpt-4o-mini")
llm_2 = init_chat_model(model="anthropic:claude-haiku-4-5-20251001")

llm_1.invoke("Classify this document", config={"callbacks": [callback]})
llm_2.invoke("Extract obligations", config={"callbacks": [callback]})

print(callback.usage_metadata)
# {model_name: {input_tokens, output_tokens, total_tokens, input_token_details, output_token_details}}

Use one fresh handler per request. Sharing a handler across concurrent requests corrupts attribution — there is no .reset() method.

get_usage_metadata_callback() — per-request guard

from langchain_core.callbacks import get_usage_metadata_callback

BUDGET_TOKENS = 5_000

with get_usage_metadata_callback() as cb:
    result = pipeline.invoke({"input": user_prompt})
    total = sum(m["total_tokens"] for m in cb.usage_metadata.values())
    if total > BUDGET_TOKENS:
        logger.warning("Budget exceeded", extra={"tokens": total})

Two gotchas that silently break your dashboard

OpenAI streaming: you must set ChatOpenAI(model=..., stream_usage=True). Otherwise streaming calls report zero tokens. Anthropic includes usage by default.
Reasoning tokens: for o3 and Claude extended thinking, reasoning tokens land in output_token_details['reasoning'], not in output_tokens. Sum only output_tokens and you systematically undercount cost.

LangGraph per-node tracking

The trick is a state field with a merge reducer:

from typing import TypedDict, Annotated
import operator
from langchain_core.callbacks import get_usage_metadata_callback

class AgentState(TypedDict):
    messages: list
    node_costs: Annotated[dict, operator.or_]

def research_node(state: AgentState, config) -> dict:
    with get_usage_metadata_callback() as cb:
        result = research_llm.invoke(state["messages"], config=config)
    return {
        "messages": [result],
        "node_costs": {"research": cb.usage_metadata},
    }

When the graph finishes, state["node_costs"] holds {research: {...}, writer: {...}, reviewer: {...}}. Budziak's strategic observation: “In a researcher-writer-reviewer graph, the researcher node running against a large-context model is typically responsible for 75–80% of total cost.” That's where prompt caching and RAG tuning earn their keep.

Comparison

Tool	Scope	Best for
`UsageMetadataCallbackHandler`	Multi-invoke session	End-of-session billing roll-up
`get_usage_metadata_callback()`	Single `with` block	Budget guards, per-node tracking
LangSmith cost tracking	Whole project	$ dashboards, cross-trace aggregation
Provider invoice	Per model / month	Audit only — no user/workflow join

vs. third-party tools (Langfuse, MLflow tracing): the LangChain-native callbacks need no extra service, run inside your code path, and write straight to your billing DB. LangSmith sits a layer up — it auto-derives dollar cost from token counts using a built-in pricing table for OpenAI, Anthropic, and Gemini, with manual override hooks for non-linear pricing models like Gemini 3.1 Pro.

Use cases

Multi-tenant SaaS billing. Fresh handler per request, tag config["metadata"] = {"user_id": ..., "session_id": ...}, persist callback.usage_metadata to a usage table. The metadata also surfaces in LangSmith traces for forensics.
Cost localization in agents. Per-node tracking tells you which step (research vs. writing vs. review) is burning tokens, so optimization effort goes to the 80% node, not the 5% one.
Budget circuit breakers. Wrap a request in get_usage_metadata_callback(), abort or downgrade to a cheaper model when the running total crosses a threshold.

Limitations & pricing

Requires langchain-core >= 0.3.49.
No .reset() on UsageMetadataCallbackHandler — one fresh instance per session/request.
OpenAI streaming requires explicit stream_usage=True; forgetting it gives silently-wrong dashboards.
Callbacks don't own attribution — user/tenant tagging is your app layer's job (use config["metadata"]).
LangChain core is free and open source; LangSmith dashboards are a separate paid tier on the LangChain platform.

What's next

Token-attribution is becoming a baseline FinOps requirement for any LLM app with more than one model or more than one tenant. Expect deeper LangSmith dashboards (per-tenant, per-node), better reasoning-token pricing surfaces as o3-class models go mainstream, and more libraries adopting the same usage_metadata shape so cross-framework cost reporting stops being a custom integration project.

Source: Lubu Labs — Token Cost Attribution in Multi-Model LangChain Pipelines, LangChain_OSS Community Spotlight, LangSmith cost tracking docs.

Token Cost Attribution in Multi-Model LangChain Pipelines

TL;DR

What's new

Why it matters

Technical facts

UsageMetadataCallbackHandler — session aggregation

get_usage_metadata_callback() — per-request guard

Two gotchas that silently break your dashboard

LangGraph per-node tracking

Comparison

Use cases

Limitations & pricing

What's next

Tiếp tục lướt

text2sql: Agentic Text-to-SQL trên Deep Agents đạt 100% Spider, không cần RAG

Distributed LangGraph with RemoteGraph: the reducer trap nobody warns you about

API dưới 100ms: kiến trúc tạo ra tốc độ, văn hóa giữ nó sống

Fleet vừa thêm Presentation Viewer: để agent design và trình chiếu slide ngay trong app

OpenRouter ra mắt Workspaces: chia dev/staging/prod thành môi trường riêng với API key, routing, guardrails, observability độc lập