IMDS LogoCicagolab LogoDeep Fountain Logo

EntropyMath Leaderboard

A high-entropy mathematical reasoning benchmark for LLMs

Model Accuracy vs Pass@3

100%
75%
50%
25%
0%
Gemini-3-Pro-Preview
GPT-5.2 (high)
Solar Pro 3 (Round 2)
Solar Pro 3 (Round 2)
Kanana-2-30B-Thinking-2601
Kanana-2-30B-Thinking-2601
K-EXAONE-236B-A23B
K-EXAONE-236B-A23B
Accuracy
Pass@3

Avg Token Usage (Per Problem)

20.7K
15.5K
10.3K
5.2K
0
K-EXAONE-236B-A23B
K-EXAONE-236B-A23B
Solar Pro 3 (Round 2)
Solar Pro 3 (Round 2)
Gemini-3-Pro-Preview
Kanana-2-30B-Thinking-2601
Kanana-2-30B-Thinking-2601
GPT-5.2 (high)
Avg Tokens / Problem

EntropyMath is an evolutionary multi-agent system and benchmark that generates high-entropy math problems designed to systematically break current LLMs.

Results are reported using Pass@3 metrics to account for generation variance, alongside detailed execution traces for transparency.

Performance Legend

Mastery (100%)
3/3
Strong (66%)
2/3
Weak (33%)
1/3
Fail (0%)
0/3
ModelAccPass@301234567891011
API / Others
Gemini-3-Pro-Preview
100.0100.01/11/11/11/11/11/11/11/11/11/11/11/1
GPT-5.2 (high)
83.383.31/11/11/11/11/11/11/10/11/11/10/11/1
K-LLM Project Round 2
Solar Pro 3 (Round 2)Solar Pro 3 (Round 2)
75.083.33/33/33/33/30/32/33/30/32/33/32/33/3
K-EXAONE-236B-A23BK-EXAONE-236B-A23B
58.375.03/32/33/33/31/30/33/30/33/30/31/32/3
Local - KR
Kanana-2-30B-Thinking-2601Kanana-2-30B-Thinking-2601
61.183.33/33/33/30/31/33/33/30/31/32/31/32/3