Deep Code Bench: A New Benchmark Dataset for Code Retrieval

2025-09-11
Deep Code Bench: A New Benchmark Dataset for Code Retrieval

Qodo has released Deep Code Bench, a novel benchmark dataset of real-world questions derived from large, complex code repositories. Unlike existing benchmarks, these questions require retrieval across multiple files, mirroring real-world developer scenarios. The dataset, generated using LLMs from pull request data, provides a robust evaluation of code retrieval systems. Qodo's deep research agent outperforms others in fact recall, achieving ~76% accuracy.

Development benchmark dataset