- Simon Budziak (CTO, Lubu Labs) shows how to pin LLM cost to the right user, request, and LangGraph node using UsageMetadataCallbackHandler, get_usage_metadata_callback(), and a tiny state-merge trick.
- Stop guessing which node burned 80% of your budget.
TL;DR
LangChain has two built-in primitives for token attribution: UsageMetadataCallbackHandler (session roll-up) and get_usage_metadata_callback() (per-request guard). Pair them with LangGraph's Annotated[dict, operator.or_] state reducer and you get per-node, per-user, per-session token breakdowns — without a third-party observability SaaS. Lubu Labs documents the pattern after a real $1,400 → $4,200 cost incident traced to one misconfigured batch job.
What's new
The LangChain Community Spotlight just highlighted a guide by Simon Budziak, CTO at Lubu Labs, on attributing token cost across multi-model pipelines. The core APIs aren't brand new — they shipped in langchain-core 0.3.49 — but the guide is the first end-to-end recipe that ties callbacks, request metadata, and LangGraph state merging into one production attribution pattern.
Why it matters
Provider invoices aggregate by model only. A single user request hitting GPT-4o-mini, Claude Haiku, and GPT-4o leaves your finance team with three line items across two invoices and no join key. The article opens with a concrete failure mode: “Forecast: $1,400. Invoice: $4,200” — a 3× overage caused by one customer's misconfigured batch job that consumed 60% of the monthly token budget. The team burned three engineer-days correlating traces to find it. Per-node + per-tenant attribution turns that into a day-one alert.
Technical facts
UsageMetadataCallbackHandler — session aggregation
from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler
callback = UsageMetadataCallbackHandler()
llm_1 = init_chat_model(model="openai:gpt-4o-mini")
llm_2 = init_chat_model(model="anthropic:claude-haiku-4-5-20251001")
llm_1.invoke("Classify this document", config={"callbacks": [callback]})
llm_2.invoke("Extract obligations", config={"callbacks": [callback]})
print(callback.usage_metadata)
# {model_name: {input_tokens, output_tokens, total_tokens, input_token_details, output_token_details}}
Use one fresh handler per request. Sharing a handler across concurrent requests corrupts attribution — there is no .reset() method.
get_usage_metadata_callback() — per-request guard
from langchain_core.callbacks import get_usage_metadata_callback
BUDGET_TOKENS = 5_000
with get_usage_metadata_callback() as cb:
result = pipeline.invoke({"input": user_prompt})
total = sum(m["total_tokens"] for m in cb.usage_metadata.values())
if total > BUDGET_TOKENS:
logger.warning("Budget exceeded", extra={"tokens": total})
Two gotchas that silently break your dashboard
- OpenAI streaming: you must set
ChatOpenAI(model=..., stream_usage=True). Otherwise streaming calls report zero tokens. Anthropic includes usage by default. - Reasoning tokens: for o3 and Claude extended thinking, reasoning tokens land in
output_token_details['reasoning'], not inoutput_tokens. Sum onlyoutput_tokensand you systematically undercount cost.
LangGraph per-node tracking
The trick is a state field with a merge reducer:
from typing import TypedDict, Annotated
import operator
from langchain_core.callbacks import get_usage_metadata_callback
class AgentState(TypedDict):
messages: list
node_costs: Annotated[dict, operator.or_]
def research_node(state: AgentState, config) -> dict:
with get_usage_metadata_callback() as cb:
result = research_llm.invoke(state["messages"], config=config)
return {
"messages": [result],
"node_costs": {"research": cb.usage_metadata},
}
When the graph finishes, state["node_costs"] holds {research: {...}, writer: {...}, reviewer: {...}}. Budziak's strategic observation: “In a researcher-writer-reviewer graph, the researcher node running against a large-context model is typically responsible for 75–80% of total cost.” That's where prompt caching and RAG tuning earn their keep.
Comparison
| Tool | Scope | Best for |
|---|---|---|
UsageMetadataCallbackHandler | Multi-invoke session | End-of-session billing roll-up |
get_usage_metadata_callback() | Single with block | Budget guards, per-node tracking |
| LangSmith cost tracking | Whole project | $ dashboards, cross-trace aggregation |
| Provider invoice | Per model / month | Audit only — no user/workflow join |
vs. third-party tools (Langfuse, MLflow tracing): the LangChain-native callbacks need no extra service, run inside your code path, and write straight to your billing DB. LangSmith sits a layer up — it auto-derives dollar cost from token counts using a built-in pricing table for OpenAI, Anthropic, and Gemini, with manual override hooks for non-linear pricing models like Gemini 3.1 Pro.
Use cases
- Multi-tenant SaaS billing. Fresh handler per request, tag
config["metadata"] = {"user_id": ..., "session_id": ...}, persistcallback.usage_metadatato a usage table. The metadata also surfaces in LangSmith traces for forensics. - Cost localization in agents. Per-node tracking tells you which step (research vs. writing vs. review) is burning tokens, so optimization effort goes to the 80% node, not the 5% one.
- Budget circuit breakers. Wrap a request in
get_usage_metadata_callback(), abort or downgrade to a cheaper model when the running total crosses a threshold.
Limitations & pricing
- Requires
langchain-core >= 0.3.49. - No
.reset()onUsageMetadataCallbackHandler— one fresh instance per session/request. - OpenAI streaming requires explicit
stream_usage=True; forgetting it gives silently-wrong dashboards. - Callbacks don't own attribution — user/tenant tagging is your app layer's job (use
config["metadata"]). - LangChain core is free and open source; LangSmith dashboards are a separate paid tier on the LangChain platform.
What's next
Token-attribution is becoming a baseline FinOps requirement for any LLM app with more than one model or more than one tenant. Expect deeper LangSmith dashboards (per-tenant, per-node), better reasoning-token pricing surfaces as o3-class models go mainstream, and more libraries adopting the same usage_metadata shape so cross-framework cost reporting stops being a custom integration project.
Source: Lubu Labs — Token Cost Attribution in Multi-Model LangChain Pipelines, LangChain_OSS Community Spotlight, LangSmith cost tracking docs.