Dataset
Access the open-sourced gold subset of 220 tasks on Hugging Face, including prompts and reference files
Access the dataset and automated grading service
The GDPval benchmark includes 1,320 specialized tasks across 44 occupations from the top 9 industries contributing to U.S. GDP. We're releasing a gold subset of 220 tasks (5 per occupation) for public use, along with an automated grading service to facilitate research.
Each task in the dataset includes a realistic prompt, reference files, and context—reflecting real work products from experienced professionals. The automated grader provides an experimental research service to help researchers quickly evaluate model outputs.