🏗️Architecturealso: temperature parameter, llm temperature, sampling temperature

Temperaturedefinition and how it works in 2026

Temperature: The LLM sampling parameter that controls randomness — low values stay near the most-likely next token, high values explore the distribution. 0–2 typical range.

Temperature scales the model's logits before softmax-sampling the next token. At temperature 0, the model is deterministic — always picks the highest-probability token. At temperature 1, it samples from the natural distribution. Above 1, the distribution flattens and the model picks low-probability tokens more often.

In practice, agent code paths run at 0–0.3 (deterministic tool use, classification, structured outputs). Creative writing runs at 0.7–1.2. Brainstorming or "give me 5 variations" runs at 1.0–1.5. Above 1.5 the output gets erratic; few production workflows benefit.

Most agent frameworks default to temperature 0 or 0.2 for reliability. Set it higher only when you specifically want variety in outputs and have a verification step downstream.

Frequently asked

What temperature should I use for production agents?+

For tool use, classification, structured outputs: 0–0.2. For creative content generation: 0.7–1.0. For brainstorming or variation generation: 1.0–1.5. Above 1.5 is rarely useful and often degrades quality.

Does temperature 0 mean no hallucinations?+

No. Temperature 0 makes the model deterministic but hallucinations come from the model's training, not its sampling. Low temperature reduces some kinds of factual drift but doesn't eliminate hallucination.

How does temperature relate to top-p?+

They're complementary sampling controls. Temperature scales the whole distribution; top-p truncates the long tail. Most teams pick one or the other rather than tuning both — temperature is the simpler default.

Frequently asked

Related terms