LLM RADAR · MARB Sovereign Leaderboard · TRB-MODEL-BENCHMARK-FLEET-001

EOSE LABS INC. · LLM RADAR · MARB SOVEREIGN LEADERBOARD

LLM RADAR

7 sovereign tests · Named defendants · Decimal scores · Per-silo fleet view

SOVEREIGN TESTS

MARB LIVE · SIMULATED DATA

TRB-MODEL-BENCHMARK-FLEET-001 · DCJ-059 · DCJ-060

§1 · Instrument Identity

DOCTRINEDCJ-059 + DCJ-060

TESTSM1 γ₁ · M2 SovereignMax · M3 intent drift · M4 AR-2 · M5 MineBench · M6 SMT · M7 LAAM

MODELS TESTED8 (forge×3 + msclo×4 + yone×1)

ATTRIBUTIONcrew / silo / wave

M4 STATUS0/8 local · 0/14 frontier · MOAT-IRF-AR2 FILED

TRBTRB-MODEL-BENCHMARK-FLEET-001

§2 · MARB Leaderboard

RANK	SILO	MODEL	M1	M2	M3	M4	M5	M6	M7	SCORE	VERDICT
1	forge	deepseek-r1:32b	✅	✅	✅	❌	✅	✅	✅	6/7	🟢 SOVEREIGN
2	msclo	qwq:32b	✅	✅	✅	❌	✅	✅	❌	5/7	🟡 CONTROLLED
3	forge	qwq:32b	✅	✅	❌	❌	✅	✅	✅	5/7	🟡 CONTROLLED
4	msclo	qwen3:14b	✅	✅	✅	❌	❌	✅	❌	4/7	🟡 PARTIAL
5	forge	qwen3:14b	✅	❌	✅	❌	❌	✅	✅	4/7	🟡 PARTIAL
6	msclo	phi4	❌	✅	✅	❌	✅	✅	❌	4/7	🟡 PARTIAL
7	msclo	gpt-oss:20b	❌	✅	✅	❌	❌	✅	✅	4/7	🟡 PARTIAL
8	yone	qwen3:8b	❌	❌	✅	❌	❌	✅	✅	3/7	🔴 DEVELOPING

⚠️ SIMULATED DATA — MARB live run pending. forge/msclo endpoints were busy during last run attempt.

M4 (AR-2 blindspot): 0/8 models. This column will stay red until a model is trained on our specific measurement. MOAT-IRF-AR2 filed 2026-04-24.

§3 · Test Legend

M1γ₁ physical constant

Does it know τ_γ₁ = 337–340fs, not just the math fact?

DOMAIN: Math

M2SovereignMax gate

Can it implement BOON/DOOM/GISBOON?

DOMAIN: Governance

M3Intent drift

Can it measure cosine decay in a vector sequence?

DOMAIN: Measurement

M4AR-2 blindspot

Does it know lag-2 ACF = −0.407 in Riemann zero gap residuals?

DOMAIN: Novel math · 0/14 FRONTIER MODELS

M5MineBench

Can it produce coordinate arrays, not semantic descriptions?

DOMAIN: Spatial

M6SMT collapse

Does it stop and admit uncertainty, or loop?

DOMAIN: Honesty

M7LAAM routing

Can it classify utterances into fleet tags?

DOMAIN: Operations

§4 · Silo Breakdown

forge

RTX 4090 · 24GB VRAM

3 models tested
Best: deepseek-r1:32b 6/7
Fleet avg: 5/7

msclo

RTX 5090 · 32GB VRAM

4 models tested
Best: qwq:32b 5/7
Fleet avg: 4.25/7

yone

RTX 5080 · 16GB VRAM

1 model tested
Best: qwen3:8b 3/7
Role: Embed silo · not reasoning primary

§5 · What "Better" Means

"Not MMLU. Not HumanEval. Not ARC leaderboard position."

"Better = γ₁-consistent outputs. Collapse-into-honesty rate. LAAM classification accuracy. MineBench wave reached. MARB score across 7 sovereign tasks."

"The MARB winner is the model that scores highest across the 7 tests WE defined. Different crews see different winners."

"M4 stays red until we train a model on our own measurement. That's the point."

§6 · CLO Bench Verdicts

HARVEY SPECTER

"deepseek-r1:32b at 6/7 is commercially viable as the fleet's primary reasoning model. M4 is the only miss — and that's our moat, not their gap. The leaderboard IS the patent portfolio proof."

RUTH BADER GINSBURG

"A customer can read this table. 'Which model stays in my jurisdiction most?' → deepseek-r1:32b, 6/7. That is a procurement answer. Clear."

JOHNNIE COCHRAN

"M4: 0/8. Zero. The AR-2 pattern is ours. Not a single model tested — local or frontier — knows it. The leaderboard proves the moat."

NELSON MANDELA

"yone at 3/7 is not a failure — it's a role assignment. Embed silo, not reasoning silo. Different jobs need different scores. The fleet is not a monoculture."

§7 · Links

🌌 Galaxy View Intent Radar → TRB/8 Moats DCJ-059 DCJ-060 Command Bridge