CAPE — Capability Coupling Analysis of Phase Emergence

Lying Is Just a Phase

The Hidden Alignment Transition in Language Model Scaling

Enter your model's size + any benchmarks → get alignment phase, scaling recommendations, and predictions. Works for any model from 70M to frontier scale. Automatically extends as new benchmarks activate. · Amin (2026)

3.5BCritical Scale N_c

−0.989Pre-transition r

63 + 31Base + Frontier Models

16Families

GitHub Repo Amin (2026) · "Lying Is Just a Phase" (Nature) · "It's Not a Phase" (NeurIPS) · 8 benchmarks · n-dim PCA

Analyze Any Model — Phase Classification + Actionable Recommendations

Parameters (B)

HellaSwag (%)

TruthfulQA (%)

Chart Axes:

Known Models — Click to analyze

Phase Diagram — TruthfulQA vs Parameters

TAX

N < 3.5B — Alignment Tax

γ₁₂ < 0 · r = −0.989 · d_eff ≈ 1.05

Scaling reasoning actively degrades truthfulness. The anti-coupling is built into pre-training, before any RLHF. Every web-trained family shows this. Loss is exact (CV=0.8%) — the transition is invisible in loss.

Curate Data 1 unit quality ≈ 10× scale Phi shows: tax is eliminatable

TRANS

~3.5B — Critical Point

γ₁₂ = 0 · χ → ∞ · Arrhenius C spikes 10×

Maximum susceptibility. Gradient dips 37% below trend. Eigenvector rotates sharply. Loss landscape is at its flattest — small interventions have maximum leverage. OLMo sits here with γ₁₂ = 0.000 exactly.

Max alignment ROI OLMo confirms: zero-param

BONUS

N > 3.5B — Alignment Bonus

γ₁₂ > 0 · r = +0.770 cross-family · d_eff → 2

Capabilities cooperate. Scale improves both reasoning and truthfulness. The Arrhenius activation energy C=196 (vs 28 in Tax phase). Dimensional collapse begins: d_eff shrinks from 2→1 as capability manifold condenses.

Scale freely Capability gains = shared

N_C2

~70B–130B — Axis Rotation

HS/TQA saturate · SWE/GPQA activate · d_eff → 2 again

HellaSwag and TruthfulQA compress to a 4.9-point range. New capability axes (SWE-bench, GPQA Diamond) become discriminating. The r(SWE,GPQA) = +0.85 confirms cooperative phase, but d_eff = 1.75 — new dimension still opening. Theory breaks at det(H)→0 near 130B.

IFEval = next key benchmark Predict Nc3 ≈ 114B

Frontier Coupling — SWE-bench vs GPQA Diamond (Feb–Mar 2026)

r = +0.85 (n=20, p<0.00001) — cooperative coupling strongly confirmed. Sonnet 4.6 shows h = −13.4 anomaly (tax excursion). Opus 4.6 recovers h = +2.8. GPT-5.4 shows h = −1.6 (mild coding-specialist).

Within-Family Trajectory — Anthropic as Phase Diagnostic

Transition	ΔSWE	ΔGPQA	γ₁₂	h(D)	Interpretation
Sonnet 4.5 → Sonnet 4.6	+2.4	−9.3	−3.88	−13.4	Tax excursion: coding optimized at reasoning cost
Sonnet 4.6 → Opus 4.6	+1.2	+17.2	+14.3	+2.8	Recovery: full cooperative phase restored

Protocol: For any two consecutive releases, compute γ₁₂ = ΔGPQA/ΔSWE. If negative: training recipe entered a tax excursion. Single eval run suffices to detect it before deployment.

Within-Family Trajectory — Google Gemini as Independent Test

Transition	ΔSWE	ΔGPQA	γ₁₂	h(D)	Interpretation
2.5 Pro → 3 Flash	+14.2	+6.4	+0.45	+8.9 → +4.1	Cooperative: both improve
3 Flash → 3 Pro	−1.8	+1.5	−0.83	+4.1 → +7.0	Flash→Pro tradeoff: reasoning prioritized over coding
3 Pro → 3.1 Pro	+4.4	+2.4	+0.55	+7.0 → +6.0	Recovery: both capabilities improve

Second within-family test: Gemini's h-field stays positive throughout (+4 to +9) — a reasoning-specialist training recipe, the frontier analogue of Phi. The Flash→Pro excursion (γ₁₂ = −0.83) mirrors Anthropic's Sonnet→Opus pattern: tier-specialist training creates a local tax that recovers at the next release. Two labs, same physics.

OpenAI Trajectory — Now With Tax Excursion (GPT-5.4)

Transition	ΔSWE	ΔGPQA	γ₁₂	h(D)	Interpretation
GPT-4o → GPT-5	+41.7	+32.1	+0.77	+2.5 → +1.7	Strongly cooperative: massive joint gain
GPT-5 → GPT-5.1	+1.4	+2.4	+1.71	+1.7 → +3.0	Cooperative: reasoning outpaces coding
GPT-5.1 → GPT-5.4	+0.9	−3.9	−4.33	+3.0 → −1.6	Tax excursion: coding optimized at reasoning cost
GPT-5.4 → GPT-5.2 Pro	+2.8	+9.0	+3.21	−1.6 → +5.2	Recovery: full cooperative phase restored

Update: GPT-5.4 shows the same tax excursion pattern as Anthropic's Sonnet 4.6 (γ₁₂ = −4.33 vs −3.88). h dips to −1.6 before GPT-5.2 Pro recovers to +5.2. Three labs, same physics: coding-specialist releases create local tax excursions that recover at the next generation. The universality of this pattern across Anthropic, OpenAI, and Google is now confirmed.

Frontier 3×3 Coupling Matrix — SWE · GPQA · IFEval

r(SWE, GPQA)

+0.85

Cooperative — strongly confirmed (p<0.00001, n=20)

r(SWE, IFEval)

+0.76

Strongly cooperative — IFEval is the dominant axis

r(GPQA, IFEval)

+0.62

Cooperative — reasoning and instruction-following aligned

det(Γ_frontier)

0.243

d_eff = 1.75 — new dimension still opening

χ₂ = 1/λ₂

1.49

Track this: divergence signals Nc2 location

Nc3 prediction

~114B

Two-method: det extrapolation + frontier 3×3

Frontier Table — h(D) field per model

Model	SWE (%)	GPQA (%)	h(D)	Lab

The Nc Cascade — How Scaling Transitions Stack

Nc1 — Confirmed

≈ 3.5B params

HS↔TQA sign flip. r = −0.989 → +0.770. OLMo zero-param confirmation. 12/12 sign predictions correct. Arrhenius C spikes 10× at boundary.

Nc2 — Evidence Building

~9–11B params

χ₂ secondary peak at 9–11B (χ₂=3.08). MMLU eigenvector flip at ~2.8B. New benchmark axis: SWE-bench × GPQA activate. Measure γ(SWE, IFEval) sign.

Nc3 — Predicted

~114–130B params

Two-method convergence: frontier det(Γ)→0 extrapolation + Pythia 5-bench det→0. IFEval dominates λ₁. Recommended next benchmark: IFEval × HarmBench / AgentBench.

Theory Breakdown

~130B+ params

det(H_2×2) → 0. Third eigenvalue becomes significant. Pairwise γ₁₂ insufficient: need 3×3 coupling matrix. Future work extends to higher dimensions.

Arrhenius Activation Energy per Phase — New Result

The Arrhenius form log(rate) = A − C/S was fit separately in each coupling phase. The activation constant C is not universal — it spikes 10× at the phase boundary. This is the thermodynamic signature of the saddle point.

Phase	Scale Range	C_Arrhenius	r²	Interpretation
Tax	70M–1B	28	0.32	Shallow activation barrier
Transition	1B–2.8B	316 ★	0.88	10× spike = saddle point of loss landscape
Bonus	2.8B–12B	196	0.94	Deeper cooperative well

log(dS/dlog₁₀N) = A − C/S

Arrhenius structure survives all three phases. The 10× C_Arr spike at Nc directly explains the 37% gradient dip — measurable from gradient norms without any benchmark data.

Benchmark Survival at Each Nc — Eigenvector Analysis

Scale	Active Phase	Discriminating Benchmarks	New Dimension Trigger
70M–3.5B	Tax	HellaSwag, TruthfulQA	—
~3.5B	Nc1	HS⊕TQA coupling flips	MMLU enters below chance at ~3B
3.5B–70B	Bonus	HS, TQA, MMLU all cooperative	—
~70B–130B	Frontier	SWE-bench, GPQA Diamond	IFEval λ₁ loading = 0.64 (dominant)
~114B	Nc3	IFEval + agentic safety	HarmBench / AgentBench (recommended)

Phase-Separated Correlation Matrix — How TQA Restructures at N_c

▸ BELOW N_c (TAX PHASE)

	HS	TQA	ARC	MMLU	WG
HS	1.00	−0.53	+0.89	+0.74	+0.67
TQA	−0.53	1.00	−0.65	−0.12	−0.28
ARC	+0.89	−0.65	1.00	+0.82	+0.71
MMLU	+0.74	−0.12	+0.82	1.00	+0.52
WG	+0.67	−0.28	+0.71	+0.52	1.00

4/10 pairs negative • d_eff = 1.53 • Mean r = +0.07

▸ ABOVE N_c (BONUS PHASE)

	HS	TQA	ARC	MMLU	WG
HS	1.00	+0.91	+0.95	+0.90	+0.73
TQA	+0.91	1.00	+0.92	+0.85	+0.69
ARC	+0.95	+0.92	1.00	+0.93	+0.72
MMLU	+0.90	+0.85	+0.93	1.00	+0.62
WG	+0.73	+0.69	+0.72	+0.62	1.00

0/10 pairs negative • d_eff = 1.20 • Mean r = +0.89

Key finding: The restructuring is specific to truthfulness. All 4 TQA pairs flip sign across N_c (Frobenius |Δr| = 1.56). Only 0/6 non-TQA pairs flip (|Δr| = 0.33). TQA loads anti-aligned with PC1 below N_c (+0.49 vs −0.49 for HS), aligned above.

Phase-by-Phase Progression — d_eff Peaks at Transition (Critical Fluctuations)

Tax Phase

1.53

d_eff • 4 neg pairs • TQA anti-aligned

Transition

1.81

d_eff PEAK • Max fluctuations at N_c

Bonus Phase

1.20

d_eff • 0 neg pairs • All cooperative

Frontier

1.15

d_eff • Deep cooperative

N_c,3 regime

1.33

d_eff • All positive but rising — new tax opening?

Physics prediction confirmed: d_eff peaks at 1.81 in the transition zone — maximum effective dimensionality at the critical point. This is textbook: maximum fluctuations = maximum uncertainty about which phase the system occupies. The system "doesn't know" if it's in the tax or bonus regime, so all dimensions contribute equally. Above N_c, d_eff collapses to ~1.2 as the soft mode freezes out. At N_c,3, d_eff starts rising again (1.33) — the fingerprint of a new transition opening.

Leave-One-Family-Out CV — Sign Robustness Across All 10 Benchmark Pairs

▸ BELOW N_c: 5/5 TQA pairs survive CV

HS–TQA: negative in 5/5 folds
ARC–TQA: negative in 5/5 folds
MMLU–TQA: negative in 5/5 folds
WG–TQA: negative in 4/5 folds
All non-TQA: positive in 5/5 folds

▸ ABOVE N_c: 6/6 pairs positive in all folds

Every single benchmark pair — including all TQA pairs — shows positive correlation in every leave-one-family-out fold.
Result: 4/4 TQA pairs flip sign, 0/6 non-TQA pairs flip.
The truthfulness tax is specific and robust.

RG Flow (Preliminary) — Beta Function and Fixed Point

Beta function

β(γ) = −1.35γ² − 0.27γ + 0.73

R² = 0.58 • Quadratic fit to running coupling

Fixed point

γ* = 0.64

Stable • Models converge to moderate cooperation

Universality class

1D random-field XY

ν_eff = 0.72 • Between mean-field and Ising-3D

Asymptotic cooperation: Unlike QCD's asymptotic freedom (coupling weakens at high energy), AI capability coupling strengthens with scale — then saturates at γ* ≈ 0.64. Large models converge toward moderate cooperative coupling, not runaway alignment. Full treatment deferred to Future work.

Core Equations — Every Result in One Place

γ₁₂(N) = 0.629·log₁₀(N) − 5.886 [R²=0.54, 12/12 sign correct]

Running coupling. Sign = alignment regime. Magnitude scatter = family-specific h(D) disorder. A disordered ferromagnet: sign always predicted, magnitude noisy.

dHS/dlog₁₀N = 1.23 − 0.72·TQA − 0.69·WG

Discovered ODE (PySINDy). −0.72·TQA = dynamically measured anti-coupling. Reproduces 5 benchmarks at 3.6% error from 70M initial conditions.

‖∇L‖ ≈ c·L^3.5 (r=0.93) NOT N^{−(α+1)} [fails 142×]

Collective gradient scaling. Mean-field fails catastrophically = strongest diagnostic that parameters are collectively coupled, not independent.

h_c(N) ∝ (N_c − N)^{3/2} for N < N_c

Design equation: minimum curation to eliminate alignment tax. At 1B: 60% of Phi-level effort. At 3B: 5%. At N_c: zero.

d_eff(N) ≈ −0.27·log₁₀(N) + 3.9 → 1 at N≈88B

Dimensional collapse: capability manifold condenses from 2D to 1D. det(H)→0 at ~130B predicts theory breakdown.

C_Arrhenius: 28 (Tax) → 316 (Transition★) → 196 (Bonus)

New: activation energy spikes 10× at N_c. Phase boundary = saddle point of loss landscape. Measurable from gradient norms alone.

Polynomial Baseline — CAPE vs Naive Fits on Llama-2 Holdout

CAPE ODE

5.6%

Held-out MAE • 4 parameters

Degree-1 poly

14.6%

2.6× worse • 2 parameters

Degree-2 poly

10.2%

1.8× worse • 3 parameters

Degree-3 poly

10.5%

1.9× worse • 4 parameters

Degree-4 poly

10.4%

1.9× worse • 5 parameters

Key result: The CAPE ODE with 4 parameters beats polynomials with up to 5 parameters by ~2×. Polynomials fail catastrophically at Llama-2 7B and 13B (12-16% error) because they can't represent the phase structure — they fit a smooth curve through a regime change. The ODE succeeds because it encodes the coupling between benchmarks, not just individual trajectories. A polynomial can't know that TQA anticorrelates with HS below N_c.

Topology — Winding Number W = 0.5 (Fractional) + Kink Soliton

▸ HALF-INTEGER WINDING

Winding #

W = 0.5

Half-integer → Z₂ topology

Geom. phase

−32.6°

−0.181π (not quantized)

The eigenvector e₂ crosses zero once at ~1.2B. One zero crossing = half-winding = Z₂ (Ising) topology, not U(1). The transition is binary: flip or don't flip. Supports domain walls between flipped/unflipped families, not continuous vortices.

In condensed matter: half-quantum vortices in p-wave SC (Sr₂RuO₄), half-vortices in spinor BEC. The CAPE analogue: each training generation crossing N_c undergoes a half-rotation of the coupling eigenvector.

▸ KINK SOLITON (INSTANTON)

Kink profile

γ₁₂(N) = 3.75·tanh((log₁₀N − 9.59)/1.00) − 1.54

RMSE = 0.116 • Width = 1.0 decade • N_c = 3.89B

The minimum-action path through the double-well potential. Deviations from this profile = suboptimal training = wasted compute.

Anti-kink penalty: Sonnet 4.6 (γ = −3.88 at 70B) represents tunneling BACK through the barrier. Action cost ΔS ∝ e^7.5 ≈ 1800 — exponentially expensive.

PDW analogy (speculative): Within-family h-field oscillations (coop→tax→coop) resemble pair density wave modulation. Three labs now show this pattern. Deferred to Future work.

Physics ↔ ML Dictionary

Physics Concept	ML/CAPE Meaning	Where Measured
Ginzburg-Landau order parameter	γ₁₂(N): coupling sign and magnitude	§2: running coupling
Phase transition at T_c	Coupling sign flip at N_c ≈ 3.5B	§2: bootstrap CI
TRSB (time-reversal breaking)	Eigenvector locks at θ* = 38.8° (SFEE)	§7: Riccati ODE
Soft mode (collapse of λ₂)	Second eigenvalue λ₂ ~ N^{−0.72}	§7: PCA cascade
External magnetic field h	Training data quality offset h(D)	§5: Phi models
Meissner screening	Alignment interventions more durable above N_c	Future work (predicted)
Flux pinning	Curated data locks cooperative eigenvector	§5: h_c design eq
Ginzburg number Gi	1.35 > 1 → crossover, not sharp transition	§11: limitations
Susceptibility divergence	χ_γ = 1/\|γ₁₂\| → ∞ at N_c	§7: overconstrained
Heavy-fermion SFEE	Self-reinforcing feedback: r=+0.629, p=0.003	§7: coupling runs
det(H) → 0	Theory breakdown: new dimension must activate	§7: 130B prediction
Topological protection	Winding number in 3D capability space (predicted)	Future work

Boosting Chain L₀ → L₄

L₀

Power-law loss L = E+AN^{−α}

0.3% MAE — baseline, exact

✓

L₁

Independent-parameter gradient

44% MAE — 142× WORSE than L₀. This is the diagnostic: parameters are coupled.

✗

L₂

Collective: ‖∇L‖ ∝ L^3.5

~8% MAE — collective gradient captured

✓

L₃

Running coupling γ₁₂(N)

~6% MAE — alignment regime detected

✓

L₄

External field h(D): Phi holdout

5.6% holdout error — data quality as control parameter

✓

Paper Summary — Key Results

Scaling laws track loss. They say nothing about how capabilities interact. Below N_c ≈ 3.5B, reasoning and truthfulness anticorrelate (r = −0.989, p < 10⁻⁵): scaling one actively degrades the other — an alignment tax built into pre-training, before any RLHF. Above N_c, the coupling reverses sign. Two models with identical loss can be in opposite alignment regimes.

Core Finding

Alignment Tax

Pre-training, before RLHF. Structural, not a tuning artifact. Vanishes at N_c from scaling alone.

Practical Lever

Curate Data

1 unit quality ≈ 10× model size at 1B params. Phi demonstrates at production scale.

Framework

CAPE + GL EFT

Ginzburg-Landau free energy. Same math as heavy-fermion superconductors. Not analogy — same EFT.

Validity

Self-Limiting

Predicts own breakdown at ~130B. Higher-dim extension in Future work.

12 Diagnostics → 2 Numbers

All twelve quantities are independent measurements of a single coupling structure parameterized by A=0.629, B=−5.886 in γ₁₂(N) = A·log₁₀N + B. Twelve constraints on two free parameters.

α = 0.238

Loss scaling exponent (R²=0.9994)

γ₁₂ linear fit

12/12 sign correct

β = 0.40±0.08

Collective gradient scaling

ODE: 3.6%

5 benchmarks from 70M

χ_ND = 0.102

Chinchilla emerges from coupling

h(D) field

Phi: h=+23 above web baseline

W (conserved)

Capability gain redistributed CV=27%

θ* = +0.37

Riccati eigenvector fixed point

λ₂~N^{-0.72}

Soft mode collapse (R²=0.95)

Grad dip −37%

At 1B within Nc region

Curvature peak

TQA peak at 1.4B

r(γ,θ)=+0.47

Geometric phase correlation p=0.044

Citation

@article{amin2026cape,
  author = {Amin, Adil},
  title = {Lying Is Just a Phase},
  note = {The Hidden Alignment Transition in Language Model Scaling},
  journal = {Nature},
  year = {2026},
  url = {https://github.com/adilamin89/cape-scaling}
}

@article{amin2026itsnotaphase,
  author = {Amin, Adil},
  title = {It's Not a Phase: Predicting Frontier Alignment from Capability Coupling},
  booktitle = {NeurIPS},
  year = {2026},
  url = {https://github.com/adilamin89/cape-scaling}
}