Reasoning model
A class of LLM (o3, Claude Sonnet 4.6, Gemini 2.5 reasoning) that produces a long internal chain of thought before responding — trading latency for accuracy on hard problems.
Reasoning models extend the standard inference pattern: instead of generating a response token-by-token, they produce a long internal trace ("thinking") that the user doesn't see, and then a concise final answer.
They are dramatically more accurate on math, code, and multi-step reasoning, and dramatically slower and more expensive. Most 2026 agent stacks use reasoning models for planning and verification steps and faster non-reasoning models for the bulk of tool calls.
The cost gap is narrowing as inference optimizations land. By the end of 2026, "reasoning by default" is the expected behavior across most agent stacks.
Frequently asked
When should I use a reasoning model in my agent?+
For the planning step, for evaluation/verification of agent output, and for any tool call where correctness matters more than latency. Use a non-reasoning model for routine tool calls and conversational turns.