New CoREB benchmark and reranker improve code search beyond retrieval

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced CoREB, a new benchmark designed to evaluate code search systems beyond simple retrieval. This benchmark addresses limitations in existing datasets, such as data contamination and noisy labels, by using counterfactually rewritten problems across five programming languages. Experiments on CoREB revealed that while code-specialized embeddings excel in code-to-code retrieval, short keyword queries significantly degrade performance for all models. The study also highlights the task-specific nature of off-the-shelf rerankers, and introduces a fine-tuned reranker that shows consistent improvements across all evaluated tasks. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new benchmark and model to improve code search capabilities, potentially impacting developer productivity.

RANK_REASON This is a research paper introducing a new benchmark and model for code search. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

COVERAGE [1]

arXiv cs.AI TIER_1 · Hang Yu · 2026-05-06 08:05

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this pap…

COVERAGE [1]

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

RELATED ENTITIES

RELATED TOPICS