PulseAugur
LIVE 10:35:48
research · [5 sources] ·
0
research

New methods BAMI and AutoFocus improve GUI grounding for AI agents

Researchers have developed two new training-free methods, BAMI and AutoFocus, to improve the accuracy of GUI grounding for AI agents. BAMI addresses precision and ambiguity biases by using coarse-to-fine focus and candidate selection, boosting the TianXi-Action-7B model's performance on the ScreenSpot-Pro benchmark from 51.9% to 57.8%. AutoFocus tackles resolution gaps in high-resolution interfaces by employing uncertainty-aware active visual search, using token-level perplexity to model spatial uncertainty and improve grounding across various VLMs on benchmarks like ScreenSpot-Pro and ScreenSpot-V2. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT These methods could enhance the reliability and precision of AI agents interacting with graphical user interfaces, enabling more complex task automation.

RANK_REASON The cluster contains two arXiv papers detailing novel methods for improving AI agent performance in GUI grounding tasks.

Read on Hugging Face Daily Papers →

COVERAGE [5]

  1. Hugging Face Daily Papers TIER_1 ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…

  2. arXiv cs.CV TIER_1 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    arXiv:2605.06664v1 Announce Type: new Abstract: GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance…

  3. arXiv cs.CV TIER_1 · Jiwen Lu ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…

  4. arXiv cs.CV TIER_1 · Ruilin Yao, Shegnwu Xiong, Tianyu Zou, Shili Xiong, Yi Rong ·

    AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

    arXiv:2605.02630v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense lay…

  5. arXiv cs.CV TIER_1 · Yi Rong ·

    AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

    Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense layouts and small interactive elements expose a res…