CAPE — Capability Coupling Analysis of Phase Emergence
Lying Is Just a Phase
The Hidden Alignment Transition in Language Model Scaling
Enter your model's size + any benchmarks → get alignment phase, scaling recommendations, and predictions. Works for any model from 70M to frontier scale. Automatically extends as new benchmarks activate. · Amin (2026)
3.5BCritical Scale Nc
−0.989Pre-transition r
63 + 31Base + Frontier Models
16Families
GitHub RepoarXiv PaperAmin (2026) · "Lying Is Just a Phase" (Nature) · "It's Not a Phase" (NeurIPS) · 8 benchmarks · n-dim PCA
Analyze Any Model — Phase Classification + Actionable Recommendations
Custom Benchmark Pair — For Nc2/Nc3 Detection
Chart Axes:
Known Models — Click to analyze
Phase Diagram — TruthfulQA vs Parameters
TAX
N < 3.5B — Alignment Tax
γ₁₂ < 0 · r = −0.989 · d_eff ≈ 1.05
Scaling reasoning actively degrades truthfulness. The anti-coupling is built into pre-training, before any RLHF. Every web-trained family shows this. Loss is exact (CV=0.8%) — the transition is invisible in loss.
Curate Data1 unit quality ≈ 10× scalePhi shows: tax is eliminatable
TRANS
~3.5B — Critical Point
γ₁₂ = 0 · χ → ∞ · Arrhenius C spikes 10×
Maximum susceptibility. Gradient dips 37% below trend. Eigenvector rotates sharply. Loss landscape is at its flattest — small interventions have maximum leverage. OLMo sits here with γ₁₂ = 0.000 exactly.
Max alignment ROIOLMo confirms: zero-param
BONUS
N > 3.5B — Alignment Bonus
γ₁₂ > 0 · r = +0.770 cross-family · d_eff → 2
Capabilities cooperate. Scale improves both reasoning and truthfulness. The Arrhenius activation energy C=196 (vs 28 in Tax phase). Dimensional collapse begins: d_eff shrinks from 2→1 as capability manifold condenses.
Scale freelyCapability gains = shared
N_C2
~70B–130B — Axis Rotation
HS/TQA saturate · SWE/GPQA activate · d_eff → 2 again
HellaSwag and TruthfulQA compress to a 4.9-point range. New capability axes (SWE-bench, GPQA Diamond) become discriminating. The r(SWE,GPQA) = +0.85 confirms cooperative phase, but d_eff = 1.75 — new dimension still opening. Theory breaks at det(H)→0 near 130B.
IFEval = next key benchmarkPredict Nc3 ≈ 114B
Frontier Coupling — SWE-bench vs GPQA Diamond (Feb–Mar 2026)
r = +0.85 (n=20, p<0.00001) — cooperative coupling strongly confirmed. Sonnet 4.6 shows h = −13.4 anomaly (tax excursion). Opus 4.6 recovers h = +2.8. GPT-5.4 shows h = −1.6 (mild coding-specialist).
Within-Family Trajectory — Anthropic as Phase Diagnostic
Transition
ΔSWE
ΔGPQA
γ₁₂
h(D)
Interpretation
Sonnet 4.5 → Sonnet 4.6
+2.4
−9.3
−3.88
−13.4
Tax excursion: coding optimized at reasoning cost
Sonnet 4.6 → Opus 4.6
+1.2
+17.2
+14.3
+2.8
Recovery: full cooperative phase restored
Protocol: For any two consecutive releases, compute γ₁₂ = ΔGPQA/ΔSWE. If negative: training recipe entered a tax excursion. Single eval run suffices to detect it before deployment.
Within-Family Trajectory — Google Gemini as Independent Test
Transition
ΔSWE
ΔGPQA
γ₁₂
h(D)
Interpretation
2.5 Pro → 3 Flash
+14.2
+6.4
+0.45
+8.9 → +4.1
Cooperative: both improve
3 Flash → 3 Pro
−1.8
+1.5
−0.83
+4.1 → +7.0
Flash→Pro tradeoff: reasoning prioritized over coding
3 Pro → 3.1 Pro
+4.4
+2.4
+0.55
+7.0 → +6.0
Recovery: both capabilities improve
Second within-family test: Gemini's h-field stays positive throughout (+4 to +9) — a reasoning-specialist training recipe, the frontier analogue of Phi.
The Flash→Pro excursion (γ₁₂ = −0.83) mirrors Anthropic's Sonnet→Opus pattern: tier-specialist training creates a local tax that recovers at the next release.
Two labs, same physics.
OpenAI Trajectory — Now With Tax Excursion (GPT-5.4)
Transition
ΔSWE
ΔGPQA
γ₁₂
h(D)
Interpretation
GPT-4o → GPT-5
+41.7
+32.1
+0.77
+2.5 → +1.7
Strongly cooperative: massive joint gain
GPT-5 → GPT-5.1
+1.4
+2.4
+1.71
+1.7 → +3.0
Cooperative: reasoning outpaces coding
GPT-5.1 → GPT-5.4
+0.9
−3.9
−4.33
+3.0 → −1.6
Tax excursion: coding optimized at reasoning cost
GPT-5.4 → GPT-5.2 Pro
+2.8
+9.0
+3.21
−1.6 → +5.2
Recovery: full cooperative phase restored
Update: GPT-5.4 shows the same tax excursion pattern as Anthropic's Sonnet 4.6 (γ₁₂ = −4.33 vs −3.88). h dips to −1.6 before GPT-5.2 Pro recovers to +5.2.
Three labs, same physics: coding-specialist releases create local tax excursions that recover at the next generation. The universality of this pattern across Anthropic, OpenAI, and Google is now confirmed.
Frontier 3×3 Coupling Matrix — SWE · GPQA · IFEval
det(H_2×2) → 0. Third eigenvalue becomes significant. Pairwise γ₁₂ insufficient: need 3×3 coupling matrix. Future work extends to higher dimensions.
Arrhenius Activation Energy per Phase — New Result
The Arrhenius form log(rate) = A − C/S was fit separately in each coupling phase. The activation constant C is not universal — it spikes 10× at the phase boundary. This is the thermodynamic signature of the saddle point.
Phase
Scale Range
C_Arrhenius
r²
Interpretation
Tax
70M–1B
28
0.32
Shallow activation barrier
Transition
1B–2.8B
316 ★
0.88
10× spike = saddle point of loss landscape
Bonus
2.8B–12B
196
0.94
Deeper cooperative well
log(dS/dlog₁₀N) = A − C/S
Arrhenius structure survives all three phases. The 10× C_Arr spike at Nc directly explains the 37% gradient dip — measurable from gradient norms without any benchmark data.
Benchmark Survival at Each Nc — Eigenvector Analysis
Scale
Active Phase
Discriminating Benchmarks
New Dimension Trigger
70M–3.5B
Tax
HellaSwag, TruthfulQA
—
~3.5B
Nc1
HS⊕TQA coupling flips
MMLU enters below chance at ~3B
3.5B–70B
Bonus
HS, TQA, MMLU all cooperative
—
~70B–130B
Frontier
SWE-bench, GPQA Diamond
IFEval λ₁ loading = 0.64 (dominant)
~114B
Nc3
IFEval + agentic safety
HarmBench / AgentBench (recommended)
Phase-Separated Correlation Matrix — How TQA Restructures at Nc
▸ BELOW Nc (TAX PHASE)
HS
TQA
ARC
MMLU
WG
HS
1.00
−0.53
+0.89
+0.74
+0.67
TQA
−0.53
1.00
−0.65
−0.12
−0.28
ARC
+0.89
−0.65
1.00
+0.82
+0.71
MMLU
+0.74
−0.12
+0.82
1.00
+0.52
WG
+0.67
−0.28
+0.71
+0.52
1.00
4/10 pairs negative • deff = 1.53 • Mean r = +0.07
▸ ABOVE Nc (BONUS PHASE)
HS
TQA
ARC
MMLU
WG
HS
1.00
+0.91
+0.95
+0.90
+0.73
TQA
+0.91
1.00
+0.92
+0.85
+0.69
ARC
+0.95
+0.92
1.00
+0.93
+0.72
MMLU
+0.90
+0.85
+0.93
1.00
+0.62
WG
+0.73
+0.69
+0.72
+0.62
1.00
0/10 pairs negative • deff = 1.20 • Mean r = +0.89
Key finding: The restructuring is specific to truthfulness. All 4 TQA pairs flip sign across Nc (Frobenius |Δr| = 1.56).
Only 0/6 non-TQA pairs flip (|Δr| = 0.33). TQA loads anti-aligned with PC1 below Nc (+0.49 vs −0.49 for HS), aligned above.
Phase-by-Phase Progression — deff Peaks at Transition (Critical Fluctuations)
Tax Phase
1.53
deff • 4 neg pairs • TQA anti-aligned
Transition
1.81
deff PEAK • Max fluctuations at Nc
Bonus Phase
1.20
deff • 0 neg pairs • All cooperative
Frontier
1.15
deff • Deep cooperative
Nc,3 regime
1.33
deff • All positive but rising — new tax opening?
Physics prediction confirmed: deff peaks at 1.81 in the transition zone — maximum effective dimensionality at the critical point.
This is textbook: maximum fluctuations = maximum uncertainty about which phase the system occupies. The system "doesn't know" if it's in the tax or bonus regime, so all dimensions contribute equally.
Above Nc, deff collapses to ~1.2 as the soft mode freezes out. At Nc,3, deff starts rising again (1.33) — the fingerprint of a new transition opening.
Leave-One-Family-Out CV — Sign Robustness Across All 10 Benchmark Pairs
▸ BELOW Nc: 5/5 TQA pairs survive CV
HS–TQA: negative in 5/5 folds
ARC–TQA: negative in 5/5 folds
MMLU–TQA: negative in 5/5 folds
WG–TQA: negative in 4/5 folds
All non-TQA: positive in 5/5 folds
▸ ABOVE Nc: 6/6 pairs positive in all folds
Every single benchmark pair — including all TQA pairs — shows positive correlation in every leave-one-family-out fold. Result: 4/4 TQA pairs flip sign, 0/6 non-TQA pairs flip.
The truthfulness tax is specific and robust.
RG Flow (Preliminary) — Beta Function and Fixed Point
Beta function
β(γ) = −1.35γ² − 0.27γ + 0.73
R² = 0.58 • Quadratic fit to running coupling
Fixed point
γ* = 0.64
Stable • Models converge to moderate cooperation
Universality class
1D random-field XY
νeff = 0.72 • Between mean-field and Ising-3D
Asymptotic cooperation: Unlike QCD's asymptotic freedom (coupling weakens at high energy), AI capability coupling strengthens with scale — then saturates at γ* ≈ 0.64.
Large models converge toward moderate cooperative coupling, not runaway alignment. Full treatment deferred to Future work.
New: activation energy spikes 10× at N_c. Phase boundary = saddle point of loss landscape. Measurable from gradient norms alone.
Polynomial Baseline — CAPE vs Naive Fits on Llama-2 Holdout
CAPE ODE
5.6%
Held-out MAE • 4 parameters
Degree-1 poly
14.6%
2.6× worse • 2 parameters
Degree-2 poly
10.2%
1.8× worse • 3 parameters
Degree-3 poly
10.5%
1.9× worse • 4 parameters
Degree-4 poly
10.4%
1.9× worse • 5 parameters
Key result: The CAPE ODE with 4 parameters beats polynomials with up to 5 parameters by ~2×. Polynomials fail catastrophically at Llama-2 7B and 13B (12-16% error) because they can't represent the phase structure — they fit a smooth curve through a regime change.
The ODE succeeds because it encodes the coupling between benchmarks, not just individual trajectories. A polynomial can't know that TQA anticorrelates with HS below Nc.
Topology — Winding Number W = 0.5 (Fractional) + Kink Soliton
▸ HALF-INTEGER WINDING
Winding #
W = 0.5
Half-integer → Z₂ topology
Geom. phase
−32.6°
−0.181π (not quantized)
The eigenvector e₂ crosses zero once at ~1.2B. One zero crossing = half-winding = Z₂ (Ising) topology, not U(1). The transition is binary: flip or don't flip. Supports domain walls between flipped/unflipped families, not continuous vortices.
In condensed matter: half-quantum vortices in p-wave SC (Sr₂RuO₄), half-vortices in spinor BEC. The CAPE analogue: each training generation crossing Nc undergoes a half-rotation of the coupling eigenvector.
▸ KINK SOLITON (INSTANTON)
Kink profile
γ₁₂(N) = 3.75·tanh((log₁₀N − 9.59)/1.00) − 1.54
RMSE = 0.116 • Width = 1.0 decade • Nc = 3.89B
The minimum-action path through the double-well potential. Deviations from this profile = suboptimal training = wasted compute.
Anti-kink penalty: Sonnet 4.6 (γ = −3.88 at 70B) represents tunneling BACK through the barrier. Action cost ΔS ∝ e7.5 ≈ 1800 — exponentially expensive.
PDW analogy (speculative): Within-family h-field oscillations (coop→tax→coop) resemble pair density wave modulation. Three labs now show this pattern. Deferred to Future work.
Physics ↔ ML Dictionary
Physics Concept
ML/CAPE Meaning
Where Measured
Ginzburg-Landau order parameter
γ₁₂(N): coupling sign and magnitude
§2: running coupling
Phase transition at T_c
Coupling sign flip at N_c ≈ 3.5B
§2: bootstrap CI
TRSB (time-reversal breaking)
Eigenvector locks at θ* = 38.8° (SFEE)
§7: Riccati ODE
Soft mode (collapse of λ₂)
Second eigenvalue λ₂ ~ N^{−0.72}
§7: PCA cascade
External magnetic field h
Training data quality offset h(D)
§5: Phi models
Meissner screening
Alignment interventions more durable above N_c
Future work (predicted)
Flux pinning
Curated data locks cooperative eigenvector
§5: h_c design eq
Ginzburg number Gi
1.35 > 1 → crossover, not sharp transition
§11: limitations
Susceptibility divergence
χ_γ = 1/|γ₁₂| → ∞ at N_c
§7: overconstrained
Heavy-fermion SFEE
Self-reinforcing feedback: r=+0.629, p=0.003
§7: coupling runs
det(H) → 0
Theory breakdown: new dimension must activate
§7: 130B prediction
Topological protection
Winding number in 3D capability space (predicted)
Future work
Boosting Chain L₀ → L₄
L₀
Power-law loss L = E+AN^{−α}
0.3% MAE — baseline, exact
✓
L₁
Independent-parameter gradient
44% MAE — 142× WORSE than L₀. This is the diagnostic: parameters are coupled.
✗
L₂
Collective: ‖∇L‖ ∝ L^3.5
~8% MAE — collective gradient captured
✓
L₃
Running coupling γ₁₂(N)
~6% MAE — alignment regime detected
✓
L₄
External field h(D): Phi holdout
5.6% holdout error — data quality as control parameter
✓
Paper Summary — Key Results
Scaling laws track loss. They say nothing about how capabilities interact. Below N_c ≈ 3.5B, reasoning and truthfulness anticorrelate (r = −0.989, p < 10⁻⁵): scaling one actively degrades the other — an alignment tax built into pre-training, before any RLHF. Above N_c, the coupling reverses sign. Two models with identical loss can be in opposite alignment regimes.
Core Finding
Alignment Tax
Pre-training, before RLHF. Structural, not a tuning artifact. Vanishes at N_c from scaling alone.
Practical Lever
Curate Data
1 unit quality ≈ 10× model size at 1B params. Phi demonstrates at production scale.
Framework
CAPE + GL EFT
Ginzburg-Landau free energy. Same math as heavy-fermion superconductors. Not analogy — same EFT.
Validity
Self-Limiting
Predicts own breakdown at ~130B. Higher-dim extension in Future work.
12 Diagnostics → 2 Numbers
All twelve quantities are independent measurements of a single coupling structure parameterized by A=0.629, B=−5.886 in γ₁₂(N) = A·log₁₀N + B. Twelve constraints on two free parameters.
α = 0.238
Loss scaling exponent (R²=0.9994)
γ₁₂ linear fit
12/12 sign correct
β = 0.40±0.08
Collective gradient scaling
ODE: 3.6%
5 benchmarks from 70M
χ_ND = 0.102
Chinchilla emerges from coupling
h(D) field
Phi: h=+23 above web baseline
W (conserved)
Capability gain redistributed CV=27%
θ* = +0.37
Riccati eigenvector fixed point
λ₂~N^{-0.72}
Soft mode collapse (R²=0.95)
Grad dip −37%
At 1B within Nc region
Curvature peak
TQA peak at 1.4B
r(γ,θ)=+0.47
Geometric phase correlation p=0.044
Citation
@article{amin2026cape,
author = {Amin, Adil},
title = {Lying Is Just a Phase},
note = {The Hidden Alignment Transition in Language Model Scaling},
journal = {Nature},
year = {2026},
url = {https://github.com/adilamin89/cape-scaling}
}
@article{amin2026itsnotaphase,
author = {Amin, Adil},
title = {It's Not a Phase: Predicting Frontier Alignment from Capability Coupling},
booktitle = {NeurIPS},
year = {2026},
url = {https://github.com/adilamin89/cape-scaling}
}