Home > Leaderboard

GDPval Leaderboard - Model Performance

This board builds on the official OpenAI GDPval framework and consolidates multi-dimensional third-party data to deliver a holistic performance ranking, helping you identify AI models capable of expert-grade results on professional tasks.

20+
Models tracked
3+
Integrated evaluation systems
Expert pick
GPT-5.2
Claude Opus
Top overall performers
December 15, 2025
Last updated

Overall Performance Leaderboard

Rank Company Model Score ELO Release Date Key Tags
1 OpenAI GPT-5.2 (xhigh) - 1474 (-46 / +58) Dec. 2025 Flagship Accuracy Low error Domain-specific
2 Anthropic Claude Opus 4.5 (Reasoning) - 1410 (-45 / +45) Nov. 2025 Reasoning Aesthetics
3 OpenAI GPT-5 (high) - 1303 (-44 / +46) Aug. 2025 Accuracy Low error Text-only Domain-specific
4 Anthropic Claude 4.5 Sonnet (Reasoning) - 1290 (-44 / +43) Sep. 2025 Reasoning
5 OpenAI GPT-5.1 (high) - 1241 (-43 / +45) Nov. 2025 High tier
6 DeepSeek DeepSeek V3.2 (Reasoning) - 1208 (-43 / +47) Dec. 2025 Reasoning
7 Google Gemini 3 Pro Preview (high) - 1206 (-43 / +43) Nov. 2025 Multimodal

Note: Score is temporarily shown as “–” until more multi-source data is aggregated. If a model has multiple variants, only the best-performing one is listed. ELO values include the corresponding confidence interval.

Deep Into the Data Sources

This leaderboard is built on the following independent evaluations. Explore each source to see their methodology and full results.

Zoom In On A Single Model

Want to understand how a specific model performs across tasks and data sources? Explore our in‑depth model profile pages.