LLM observability
The practice of monitoring, tracing, and debugging LLM-powered systems in production — capturing prompts, completions, latency, cost, and errors across every call.
LLM observability is what makes production agents debuggable. Every LLM call gets traced: the prompt, the response, the tool calls, the token cost, the latency. When something breaks — and at scale, it always does — you replay the trace instead of guessing.
The 2026 stack is mature: Helicone, LangSmith, Braintrust, Arize, and OpenLLMetry cover everything from simple completion logging to full distributed agent traces with span hierarchies. The bar is no longer "do we log?" but "can we slice cost by user, prompt version, and tool call type?"
For teams shipping agents to customers, observability is non-negotiable. The cost of one bad customer outcome from an un-debuggable bug usually exceeds a year of observability spend.
Frequently asked
How is LLM observability different from regular APM?+
Regular APM tracks latency, errors, and traces between services. LLM observability adds prompt/completion capture, token cost per request, model version routing, and LLM-specific evals — all things regular APM tools were not built for.
What metrics should I track first?+
Start with per-request cost (tokens × price), p95 latency, error rate, and a simple "user satisfied?" flag. Add prompt-version cohorting once you have multiple versions in production.