Model router
A component that selects which LLM to use for each request — based on cost, latency, capability, or content classification. Sits inside an AI gateway or as a standalone routing layer.
Model routing is the practical answer to "frontier models are expensive; small models are cheap; how do I use both intelligently." A router classifies the incoming request (easy vs hard, code vs writing, sensitive vs not) and sends it to the appropriate model.
Common routing strategies: classify by query difficulty (use cheaper models for easy queries), route by task type (Claude for code, GPT-5 for general), use cascading fallbacks (try small first, escalate on uncertainty). Saves 50–90% on token costs in well-tuned production stacks.
In 2026 the easiest path is to use an LLM gateway with built-in routing (Portkey, LiteLLM). For custom routing logic, train a small classifier on your traffic. Sophisticated stacks use semantic routing (embed the query, route by similarity to historical examples).
Frequently asked
How much does model routing save?+
Production teams routinely report 50–90% cost reduction. The exact savings depend on traffic mix — if 80% of queries are easy, routing saves more than if 80% are hard.
Does model routing hurt quality?+
Done badly, yes (cheap model misclassifies a hard query). Done well, no — modern routing classifiers approach 95%+ accuracy. Measure quality against a baseline before and after deploying routing.