Back to home
Reasoning
How well can each AI model solve complex logical and mathematical problems? Ranked by reasoning Arena Elo.
RankModelScore
1
2
3
4
5
6
789
10
1112
13
14
151617
18
19
20
21
22
23
24
25
26
27
28
2930
Gemini 3.5 Flashgoogle/gemini-3.5-flash
1527.0
Claude Opus 4.6anthropic/claude-opus-4-6-thinking
1513.0
GPT-5.4openai/gpt-5.4-high
1509.0
Qwen 3.7 Maxalibaba/qwen3.7-max-preview
1498.0
Gemini 3.1 Progoogle/gemini-3.1-pro-preview
1497.0
Claude Opus 4.7anthropic/claude-opus-4-7-thinking
1494.0
X
Xiaomi: MiMo V2.5 Proxiaomi/mimo-v2.5-pro
1486.0
B
Ernie 5.1baidu/ernie-5.1
1481.0
GPT-5.5openai/gpt-5.5
1481.0
qwen3.6 maxalibaba/qwen3.6-max-preview
1479.0
Z
GLM 5.1zai/glm-5.1
1477.0
Gemini 3 Progoogle/gemini-3-pro
1476.0
Qwen 3.5 Maxalibaba/qwen3.5-max-preview
1476.0
Gemini 3 Flashgoogle/gemini-3-flash
1474.0
M
kimi k2.6moonshot/kimi-k2.6
1472.0
M
kimi k2.5moonshot/kimi-k2.5-thinking
1471.0
Gemma 4 26B A4Bgoogle/gemma-4-26b-a4b
1468.0
DeepSeek V4 Prodeepseek/deepseek-v4-pro-thinking
1467.0
Gemma 4 31Bgoogle/gemma-4-31b
1464.0
Grok 4.20xai/grok-4.20-beta-0309-reasoning
1461.0
Claude Opus 4.5anthropic/claude-opus-4-5-20251101
1461.0
Claude Sonnet 4.6anthropic/claude-sonnet-4-6
1455.0
Muse Sparkmeta-llama/muse-spark
1451.0
qwen3.6 plusalibaba/qwen3.6-plus
1451.0
Gemini 2.5 Progoogle/gemini-2.5-pro
1450.0
gemini 3 flashgoogle/gemini-3-flash (thinking-minimal)
1449.0
Qwen 3 Maxalibaba/qwen3-max-preview
1449.0
qwen3.5 397b a17balibaba/qwen3.5-397b-a17b
1448.0
X
mimo v2 proxiaomi/mimo-v2-pro
1448.0
Claude Sonnet 4.5anthropic/claude-sonnet-4-5-20250929-thinking-32k
1447.0
312 models tested