Deep Code Bench: A New Benchmark Dataset for Code Retrieval
2025-09-11

Qodo has released Deep Code Bench, a novel benchmark dataset of real-world questions derived from large, complex code repositories. Unlike existing benchmarks, these questions require retrieval across multiple files, mirroring real-world developer scenarios. The dataset, generated using LLMs from pull request data, provides a robust evaluation of code retrieval systems. Qodo's deep research agent outperforms others in fact recall, achieving ~76% accuracy.
Development
benchmark dataset