Research Background
Prior to 2020, AI progress was seen as somewhat random—tweak an architecture, get a better result.
In "Scaling Laws for Neural Language Models" (2020), Jared Kaplan and the OpenAI team demonstrated a power-law relationship: Loss (error rate) decreases linearly on a log-log plot as compute increases. This gave companies the confidence to invest 100M+ in single training runs (GPT-4), knowing exactly how smart the model would be before training finished.
Core Technical Explanation
The core finding is that performance (L) depends on three factors:
1. Parameters (N): The size of the brain (neural connections).
2. Dataset Size (D): The amount of text read.
3. Compute (C): The processing power used.
The Chinchilla Correction
In 2022, DeepMind's "Chinchilla" paper corrected OpenAI's original laws. They proved that most models (like GPT-3) were under-trained.
- Old Belief: Make the model huge (175B parameters), train on moderate data.
- New Reality: For every doubling of model size, you must double the training data tokens. Optimal compute allocation requires far more data than previously thought.
What the Data Shows
The "bitter lesson" of AI is that scale beats cleverness.
| Model | Parameters | Training Tokens | MMLU Score (5-shot) |
|---|---|---|---|
| GPT-3 (2020) | 175B | 300B | 43.9% |
| Gopher (2021) | 280B | 300B | 60.0% |
| Chinchilla (2022) | 70B | 1.4T | 67.5% |
Limitations & Open Problems
1. Data Wall: We are running out of high-quality human text. Scaling laws assume infinite data. If we train on synthetic (AI-generated) data, models can degrade ("Model Collapse").
2. Diminishing Returns: To halve the error rate from today's levels might require 100 Billion in compute, which may be economically unviable.
Why This Matters
Scaling laws dictate the geopolitics of AI. Because performance scales with compute, only entities with massive data centers (US, China, Big Tech) can compete at the frontier. It turns AI from a software problem into a heavy industrial infrastructure problem.
---
Verified by Global AI News Editorial Board. Sources: Kaplan et al. (2020), Hoffman et al. (2022) "Training Compute-Optimal Large Language Models"