From Web to Pixels: Bringing Agentic Search into Visual Perception
Researchers have introduced a new benchmark and framework called WebEye to address the challenge of visual perception in open-world scenarios. This benchmark focuses on tasks where identifying an object requires external information, such as recent events or multi-hop relations, before it can be localized within an image. The proposed Pixel-Searcher agentic workflow aims to resolve hidden target identities and bind them to visual instances, demonstrating strong performance on the WebEye benchmark. AI
IMPACT Introduces a new benchmark and agentic workflow for visual perception, potentially advancing research in open-world object identification and grounding.