P
PiazBench
Back to home

Coding

How well can each AI model write, debug, and refactor code? Ranked by coding-specific Arena Elo score.

RankModelScore
1
Anthropic
Claude Opus 4.6anthropic/claude-opus-4-6-thinking
1536.0
2
Anthropic
Claude Opus 4.7anthropic/claude-opus-4-7-thinking
1521.0
3
Anthropic
Claude Opus 4.5anthropic/claude-opus-4-5-20251101-thinking-32k
1503.0
4
Z
GLM 5.1zai/glm-5.1
1499.0
5
X
Xiaomi: MiMo V2.5 Proxiaomi/mimo-v2.5-pro
1499.0
6
Alibaba
Qwen 3.7 Maxalibaba/qwen3.7-max-preview
1498.0
7
OpenAI
GPT-5.5openai/gpt-5.5-high
1498.0
8
Anthropic
Claude Sonnet 4.6anthropic/claude-sonnet-4-6
1498.0
9
OpenAI
GPT-5.4openai/gpt-5.4-high
1496.0
10
Google
Gemini 3.5 Flashgoogle/gemini-3.5-flash
1491.0
11
B
Ernie 5.1baidu/ernie-5.1
1489.0
12
Google
Gemini 3.1 Progoogle/gemini-3.1-pro-preview
1488.0
13
Anthropic
Claude Sonnet 4.5anthropic/claude-sonnet-4-5-20250929-thinking-32k
1488.0
14
Alibaba
Qwen 3.5 Maxalibaba/qwen3.5-max-preview
1487.0
15
M
kimi k2.6moonshot/kimi-k2.6
1485.0
16
Amazon
amazon nova chat 26 02 10amazon/amazon-nova-experimental-chat-26-02-10
1485.0
17
M
kimi k2.5 instantmoonshot/kimi-k2.5-instant
1485.0
18
Google
Gemini 3 Progoogle/gemini-3-pro
1483.0
19
Anthropic
Claude Opus 4.1anthropic/claude-opus-4-1-20250805-thinking-16k
1480.0
20
Meta
Muse Sparkmeta-llama/muse-spark
1478.0
21
X
mimo v2 proxiaomi/mimo-v2-pro
1477.0
22
M
kimi k2.5moonshot/kimi-k2.5-thinking
1475.0
23
M
longcat flash chat 2602 expmeituan/longcat-flash-chat-2602-exp
1474.0
24
ByteDance
dola seed 2.0 probytedance/dola-seed-2.0-pro
1473.0
25
X
Xiaomi: MiMo V2.5xiaomi/mimo-v2.5
1468.0
26
M
longcat flash chatmeituan/longcat-flash-chat
1468.0
27
Alibaba
qwen3.5 397b a17balibaba/qwen3.5-397b-a17b
1467.0
28
DeepSeek
DeepSeek V4 Prodeepseek/deepseek-v4-pro
1466.0
29
Google
Gemini 3 Flashgoogle/gemini-3-flash
1463.0
30
Alibaba
qwen3.6 maxalibaba/qwen3.6-max-preview
1463.0

307 models tested