Back to home
Coding
How well can each AI model write, debug, and refactor code? Ranked by coding-specific Arena Elo score.
RankModelScore
1
2
3
456
7
8
9
10
1112
13
14
1516
1718
19
20
21222324
252627
28
29
30
Claude Opus 4.6anthropic/claude-opus-4-6-thinking
1536.0
Claude Opus 4.7anthropic/claude-opus-4-7-thinking
1521.0
Claude Opus 4.5anthropic/claude-opus-4-5-20251101-thinking-32k
1503.0
Z
GLM 5.1zai/glm-5.1
1499.0
X
Xiaomi: MiMo V2.5 Proxiaomi/mimo-v2.5-pro
1499.0
Qwen 3.7 Maxalibaba/qwen3.7-max-preview
1498.0
GPT-5.5openai/gpt-5.5-high
1498.0
Claude Sonnet 4.6anthropic/claude-sonnet-4-6
1498.0
GPT-5.4openai/gpt-5.4-high
1496.0
Gemini 3.5 Flashgoogle/gemini-3.5-flash
1491.0
B
Ernie 5.1baidu/ernie-5.1
1489.0
Gemini 3.1 Progoogle/gemini-3.1-pro-preview
1488.0
Claude Sonnet 4.5anthropic/claude-sonnet-4-5-20250929-thinking-32k
1488.0
Qwen 3.5 Maxalibaba/qwen3.5-max-preview
1487.0
M
kimi k2.6moonshot/kimi-k2.6
1485.0
amazon nova chat 26 02 10amazon/amazon-nova-experimental-chat-26-02-10
1485.0
M
kimi k2.5 instantmoonshot/kimi-k2.5-instant
1485.0
Gemini 3 Progoogle/gemini-3-pro
1483.0
Claude Opus 4.1anthropic/claude-opus-4-1-20250805-thinking-16k
1480.0
Muse Sparkmeta-llama/muse-spark
1478.0
X
mimo v2 proxiaomi/mimo-v2-pro
1477.0
M
kimi k2.5moonshot/kimi-k2.5-thinking
1475.0
M
longcat flash chat 2602 expmeituan/longcat-flash-chat-2602-exp
1474.0
dola seed 2.0 probytedance/dola-seed-2.0-pro
1473.0
X
Xiaomi: MiMo V2.5xiaomi/mimo-v2.5
1468.0
M
longcat flash chatmeituan/longcat-flash-chat
1468.0
qwen3.5 397b a17balibaba/qwen3.5-397b-a17b
1467.0
DeepSeek V4 Prodeepseek/deepseek-v4-pro
1466.0
Gemini 3 Flashgoogle/gemini-3-flash
1463.0
qwen3.6 maxalibaba/qwen3.6-max-preview
1463.0
307 models tested