Researchers have introduced CoREB, a new benchmark designed to evaluate code search systems beyond simple retrieval. This benchmark addresses limitations in existing datasets, such as data contamination and noisy labels, by using counterfactually rewritten problems across five programming languages. Experiments on CoREB revealed that while code-specialized embeddings excel in code-to-code retrieval, short keyword queries significantly degrade performance for all models. The study also highlights the task-specific nature of off-the-shelf rerankers, and introduces a fine-tuned reranker that shows consistent improvements across all evaluated tasks. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new benchmark and model to improve code search capabilities, potentially impacting developer productivity.
RANK_REASON This is a research paper introducing a new benchmark and model for code search. [lever_c_demoted from research: ic=1 ai=1.0]