Korean CSAT 2025 (KOR)
Model Accuracy vs Pass@3
100%
75%
50%
25%
0%
GPT-5.2 (high)
Gemini-3-Pro-Preview

K-EXAONE-236B-A23B

EXAONE-4.0.1-32B (high)

Kanana-2-30B-Thinking-2601
Accuracy
Pass@3
Avg Token Usage (Per Problem)
20.1K
15.1K
10K
5K
0

K-EXAONE-236B-A23B
Gemini-3-Pro-Preview

Kanana-2-30B-Thinking-2601
GPT-5.2 (high)

EXAONE-4.0.1-32B (high)
Avg Tokens / Problem
EntropyMath is an evolutionary multi-agent system and benchmark that generates high-entropy math problems designed to systematically break current LLMs. The KOR_CSAT_25_KOR dataset represents the 2025 Korean College Scholastic Ability Test (CSAT) Math problems. This is a highly challenging benchmark specifically for verifying mathematical reasoning in Korean.
Results are reported using Pass@3 metrics to account for generation variance, alongside detailed execution traces for transparency.
Performance Legend
Mastery (100%)
3/3
Strong (66%)
2/3
Weak (33%)
1/3
Fail (0%)
0/3
| Model | Acc | Pass@3 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| API / Others | |||||||||||||||
GPT-5.2 (high) | 100.0 | 100.0 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 |
Gemini-3-Pro-Preview | 100.0 | 100.0 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 |
| K-LLM Project Round 2 | |||||||||||||||
K-EXAONE-236B-A23B | 66.7 | 84.6 | 1/3 | 0/3 | 3/3 | 1/3 | 2/3 | 3/3 | 3/3 | 2/3 | 3/3 | 0/3 | 2/3 | 3/3 | 3/3 |
| K-LLM Project Round 1 | |||||||||||||||
EXAONE-4.0.1-32B (high) | 53.8 | 76.9 | 1/3 | 0/3 | 3/3 | 2/3 | 1/3 | 1/3 | 3/3 | 3/3 | 1/3 | 0/3 | 0/3 | 3/3 | 3/3 |
| Local - KR | |||||||||||||||
Kanana-2-30B-Thinking-2601 | 53.8 | 69.2 | 0/3 | 0/3 | 3/3 | 0/3 | 3/3 | 2/3 | 3/3 | 3/3 | 3/3 | 0/3 | 1/3 | 2/3 | 1/3 |


