🏗️Architecturealso: scaling law, chinchilla scaling, neural scaling laws

Scaling laws

Empirical power-law relationships between model size, training data, and compute that predict loss and capability — the basis for every major frontier-model training plan since 2020.

Scaling laws (Kaplan 2020, Hoffmann/Chinchilla 2022) showed that loss decreases predictably as you increase parameters, data, and compute together. The Chinchilla version gave the modern recipe: for a given compute budget, the optimal ratio is roughly 20 tokens per parameter.

These laws are why labs spend $100M+ on a single training run with high confidence in the outcome. They also explain why 2026 frontier models are larger AND trained on more data — not just larger — and why the open-source community has been able to match frontier-2022 with much smaller models.

For agent builders, scaling laws are mostly background context. The practical takeaway: pick your model tier by capability, not parameter count. Modern 70B models often beat 2023 175B models.

Frequently asked

Are scaling laws breaking down?+

Pretraining-only scaling has shown signs of slowing. But the field shifted to scaling test-time compute (longer reasoning), which has its own scaling curves and is producing the biggest 2025–2026 capability gains.

Frequently asked

Related terms