GDPval Leaderboard - Model Performance
This board builds on the official OpenAI GDPval framework and consolidates multi-dimensional third-party data to deliver a holistic performance ranking, helping you identify AI models capable of expert-grade results on professional tasks.
Overall Performance Leaderboard
| Rank | Company | Model | Score | ELO | Release Date | Key Tags |
|---|---|---|---|---|---|---|
| 1 | OpenAI | GPT-5.2 (xhigh) | - | 1474 (-46 / +58) | Dec. 2025 | Flagship Accuracy Low error Domain-specific |
| 2 | Anthropic | Claude Opus 4.5 (Reasoning) | - | 1410 (-45 / +45) | Nov. 2025 | Reasoning Aesthetics |
| 3 | OpenAI | GPT-5 (high) | - | 1303 (-44 / +46) | Aug. 2025 | Accuracy Low error Text-only Domain-specific |
| 4 | Anthropic | Claude 4.5 Sonnet (Reasoning) | - | 1290 (-44 / +43) | Sep. 2025 | Reasoning |
| 5 | OpenAI | GPT-5.1 (high) | - | 1241 (-43 / +45) | Nov. 2025 | High tier |
| 6 | DeepSeek | DeepSeek V3.2 (Reasoning) | - | 1208 (-43 / +47) | Dec. 2025 | Reasoning |
| 7 | Gemini 3 Pro Preview (high) | - | 1206 (-43 / +43) | Nov. 2025 | Multimodal |
Note: Score is temporarily shown as “–” until more multi-source data is aggregated. If a model has multiple variants, only the best-performing one is listed. ELO values include the corresponding confidence interval.
Deep Into the Data Sources
This leaderboard is built on the following independent evaluations. Explore each source to see their methodology and full results.
Zoom In On A Single Model
Want to understand how a specific model performs across tasks and data sources? Explore our in‑depth model profile pages.