Context window
The maximum number of tokens a model can consider at once — covers the system prompt, conversation, tool results, and the answer being generated.
In 2026, frontier models support context windows of 200k to 2M tokens. The marketing numbers and the actual usable window often diverge — accuracy degrades on retrieval and reasoning tasks well before the hard limit.
Larger context windows changed how agents are built. Where 2023-era agents had to RAG aggressively to fit anything, modern agents often stuff the full repo, the full doc, the full conversation history in and let the model pick out what matters.
The trade-off is cost: tokens charged in linear, latency degrades super-linearly as the window grows. Stuff sparingly.
Frequently asked
Does a larger context window always help?+
No. Most models exhibit "lost in the middle" — retrieval accuracy on a fact dips when it's buried in a long context. Use RAG to put the most relevant parts at the top of the window.