PulseAugur
LIVE 11:23:15
ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Total · 30d
44
44 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
43
43 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 2/4 · 64 TOTAL
  1. RESEARCH · CL_20275 ·

    PhysForge generates physics-grounded 3D assets for virtual worlds and embodied AI

    Researchers have introduced PhysForge, a novel framework designed to generate physics-grounded 3D assets for interactive virtual worlds and embodied AI. This system addresses the limitations of existing methods by focus…

  2. RESEARCH · CL_20307 ·

    New AI models InterMesh and Anny-Fit advance 3D human pose and shape recovery

    Researchers have developed InterMesh, a new framework for multi-person human mesh recovery that explicitly incorporates human-environment interaction information. This approach enhances pose and shape estimation by enri…

  3. RESEARCH · CL_18576 ·

    Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features

    Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…

  4. TOOL · CL_18874 ·

    VLM pipeline enables viewpoint-agnostic grasping for robots with partial observations

    Researchers have developed a new end-to-end pipeline for language-guided grasping that enhances the robustness of mobile manipulators in cluttered environments. This system uses visual-language models (VLMs) and partial…

  5. RESEARCH · CL_18299 ·

    New GLANCE framework enhances VLM agents with curiosity-driven visual-linguistic exploration

    Researchers have developed a new framework called GLANCE to enhance the exploration capabilities of Visual-Linguistic Model (VLM) agents. This framework aims to improve how these agents navigate complex and partially ob…

  6. TOOL · CL_15622 ·

    VISTA benchmark launched for advanced VLM spatio-temporal interaction analysis

    Researchers have introduced VISTA, a new benchmark designed to evaluate the spatio-temporal understanding capabilities of Vision-Language Models (VLMs). Unlike existing benchmarks that focus on simple actions and limite…

  7. TOOL · CL_15611 ·

    Chain of Evidence framework enables pixel-level visual attribution for retrieval-augmented generation

    Researchers have developed a new framework called Chain of Evidence (CoE) to improve iterative retrieval-augmented generation (iRAG) systems. CoE utilizes Vision-Language Models to directly analyze screenshots of retrie…

  8. TOOL · CL_15616 ·

    Researchers propose Gromov-Wasserstein distance for VLM vision encoder selection

    Researchers have developed a new method for selecting optimal vision encoders for Vision-Language Models (VLMs). Traditional approaches, like choosing encoders with high accuracy or large size, were found to be ineffect…

  9. TOOL · CL_15782 ·

    New benchmark reveals video models forget long-term context

    Researchers have introduced SceneBench, a new benchmark designed to evaluate video understanding models' ability to retain context over long videos, particularly across different scenes. Their findings indicate that cur…

  10. RESEARCH · CL_16299 ·

    Coral and CoRAL systems optimize LLM serving and robotic control

    Researchers have developed two distinct systems named Coral and CoRAL. Coral is an adaptive system designed for cost-efficient serving of multiple large language models across heterogeneous cloud GPUs, aiming to optimiz…

  11. RESEARCH · CL_16304 ·

    Robots gain semantic understanding with VLM and adaptive memory

    Researchers have developed a "Semantic Autonomy Stack" to enable indoor mobile robots to understand natural language instructions, overcoming the latency and memory limitations of current Vision-Language Models (VLMs). …

  12. RESEARCH · CL_14362 ·

    GeoThinker framework actively integrates geometry for advanced spatial reasoning

    Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…

  13. RESEARCH · CL_13548 ·

    AI advancements span XQuery conversion, OCR pipelines, and China's benchmark challenges

    A new open-source pipeline called SGOCR 2026 has been released, designed to generate spatially-grounded OCR datasets for training vision-language models. This pipeline aims to separate text localization from semantic re…

  14. RESEARCH · CL_11793 ·

    OmniDrive-R1 enhances autonomous driving VLMs with reinforcement-driven visual grounding

    Researchers have introduced OmniDrive-R1, a novel framework for autonomous driving that integrates perception and reasoning using an interleaved Multi-modal Chain-of-Thought (iMCoT) mechanism. This approach addresses ob…

  15. RESEARCH · CL_11851 ·

    New framework uses VLM distillation for stable continual model adaptation

    Researchers have introduced Test-Time Distillation (TTD), a novel approach to address performance degradation in deep neural networks due to distribution shifts during deployment. Existing methods often suffer from pred…

  16. RESEARCH · CL_11825 ·

    Vision-language models mistake head orientation for gaze direction

    Researchers have discovered that Vision-Language Models (VLMs) struggle to accurately infer human gaze direction, often mistaking head orientation for eye movement. In a study involving 1,360 real-world images, VLMs sho…

  17. RESEARCH · CL_11758 ·

    OpAgent achieves 71.6% success rate in web navigation tasks

    Researchers have developed OpAgent, a novel web navigation agent that utilizes online reinforcement learning to overcome the limitations of static datasets. The agent employs a hierarchical multi-task fine-tuning approa…

  18. RESEARCH · CL_22533 ·

    AI drafts boost audio description quality, but quality threshold is key

    Researchers have developed methods to improve the quality and scalability of audio description (AD) generation and evaluation. One study introduces GenAD and RefineAD, a pipeline and interface that uses AI-generated dra…

  19. RESEARCH · CL_10151 ·

    ChartVerse framework synthesizes complex charts and reasoning data for VLMs

    Researchers have introduced ChartVerse, a new framework designed to generate complex charts and reliable question-answering data for Vision Language Models (VLMs). This system addresses limitations in existing datasets …

  20. RESEARCH · CL_10251 ·

    MARVIS system uses VLM reasoning over visualizations for predictive tasks

    Researchers have developed MARVIS, a novel system that enhances the reasoning capabilities of large language and vision-language models (VLMs) by converting their latent embeddings into visual representations. This appr…