New benchmark reveals VLMs struggle with high-res Earth observation details

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced UHR-Micro, a new benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to perceive small, critical details within ultra-high-resolution Earth observation imagery. Current VLMs often suffer from a "resolution illusion," where high input resolution doesn't translate to reliable perception of micro-scale targets. The benchmark, comprising over 11,000 instructions and 1,200 images, reveals significant failures in spatial grounding and evidence parsing by existing models. To address this, the team developed the Micro-evidence Active Perception (MAP) agent, which improves perception by focusing reasoning on localized observations rather than the entire high-resolution image. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Highlights limitations in current VLMs for critical micro-detail perception in high-resolution imagery, driving research into more evidence-centered reasoning agents.

RANK_REASON The cluster describes a new academic paper introducing a benchmark and a proposed agent for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Bo Du · 2026-05-12 15:07

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

Vision-Language Models (VLMs) increasingly operate on ultra-high-resolution (UHR) Earth observation imagery, yet they remain vulnerable to a severe scale mismatch between large-scale scene context and micro-scale targets. We refer to this empirical gap as a "resolution illusion":…

COVERAGE [1]

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

RELATED ENTITIES

RELATED TOPICS