ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

176

176 over 90d

Releases · 30d

0 over 90d

Papers · 30d

171

171 over 90d

TIER MIX · 90D

significant 1
research 76
tool 96
commentary 3

TOPICS

paper 171
model release 53
other 50
product 48
safety 38
infra 6

RELATIONSHIPS

instance of Vision Language Models 90%
instance of VSI-Bench 90%
instance of MLLMs 90%
used by autonomous driving 80%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by VSI-Bench 70%
used by foundation model 60%
affiliated with autonomous driving 50%

TIMELINE

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

27 day(s) with sentiment data

RECENT · PAGE 3/9 · 176 TOTAL

TOOL · CL_62988 · Jun 1 · 04:00

New VLM-driven defense framework PRISM targets backdoor attacks

Researchers have introduced PRISM, a novel framework for defending against backdoor attacks on deep neural networks. This approach shifts from internal model diagnosis to external semantic auditing, utilizing Universal …
RESEARCH · CL_62923 · Jun 1 · 04:00

New research explores advanced compression techniques for AI models

Researchers are exploring novel methods for compressing large models and datasets to improve efficiency. Papers discuss unifying dataset pruning and distillation, bootstrapped tokenization for image generation, and acti…
TOOL · CL_62883 · Jun 1 · 04:00

New benchmark reveals VLM struggles with counterfactual video reasoning

Researchers have introduced CounterVQA, a new benchmark designed to evaluate the counterfactual reasoning capabilities of Vision Language Models (VLMs). Current state-of-the-art models show a significant performance gap…
TOOL · CL_62882 · Jun 1 · 04:00

Synthetic data boosts VLM performance, researchers find

Researchers have developed a novel approach to fine-tuning Vision Language Models (VLMs) by utilizing a fully controlled synthetic data generation pipeline. This method aims to overcome biases and imbalances inherent in…
RESEARCH · CL_62832 · Jun 1 · 04:00

VLMs improved for world modeling via inverse dynamics prediction

Researchers are exploring methods to improve the predictive capabilities of vision-language models (VLMs) for world modeling. A key challenge is that VLMs struggle with forward dynamics prediction (generating future sta…
TOOL · CL_62765 · Jun 1 · 04:00

New method improves vision-language models' cross-modal similarity understanding

Researchers have developed a new method called the Variational Adapter for Cross-modal Similarity Representation (VACSR) to improve how vision-language models understand the relationship between images and text. Current…
TOOL · CL_62753 · Jun 1 · 04:00

Robotic framework GSAM enhances articulated object manipulation

Researchers have developed GSAM, a new robotic framework designed to improve the manipulation of articulated objects. This system uses a vision-based perceiver and a fine-tuned VLM with chain-of-thought reasoning to ref…
RESEARCH · CL_65074 · Jun 1 · 00:00

Vision-language models reconstruct 3D scenes as editable Blender programs

Researchers have developed a new framework called Staged Executable Inverse Graphics (SEIG) that uses vision-language models to reconstruct 3D scenes from single images. This method generates editable Blender programs, …
RESEARCH · CL_65072 · May 31 · 00:00

New benchmark tests AI's ability to code 3D models

Researchers have introduced 3DCodeBench, a new benchmark designed to evaluate vision-language models (VLMs) in their ability to generate procedural 3D models through code. The benchmark includes a dataset of multimodal …
RESEARCH · CL_64770 · May 31 · 00:00

New benchmark tests VLM understanding of Japanese charts and tables

Researchers have developed HakushoBench, a new benchmark for evaluating vision-language models (VLMs) on their ability to understand Japanese charts and tables. The dataset is derived from 33 Japanese governmental white…
RESEARCH · CL_64772 · May 30 · 00:00

New benchmark tests VLM robustness to physical visual stress

Researchers have introduced RoboStressBench, a new benchmark designed to evaluate the robustness of vision-language models (VLMs) in embodied AI systems. This benchmark decomposes visual stress into four key physical di…
RESEARCH · CL_62226 · May 29 · 17:20

VLMs show hidden gender bias, suppressing female representations

A new research paper reveals that vision-language models (VLMs) exhibit a hidden bias against female representations, even when aligned to avoid demographic stereotypes. When presented with ambiguous visual inputs, thes…
RESEARCH · CL_62277 · May 29 · 14:28

New benchmark finds VLMs unreliable for visually impaired assistance

Researchers have developed VIABLE, a new benchmark designed to evaluate the reliability of Visual Language Models (VLMs) when used as judges for Visually Impaired Assistance (VIA) tasks. Their study, which tested seven …
RESEARCH · CL_62240 · May 29 · 14:27

New FBHM benchmark reveals VLM weaknesses in hateful meme detection

Researchers have developed a new benchmark called FBHM to better evaluate the capabilities of vision-language models (VLMs) in detecting hateful memes. Existing benchmarks often confuse rhetorical strategies with target…
RESEARCH · CL_62291 · May 29 · 09:38

New VISTA framework enhances long-video event prediction

Researchers have developed VISTA, a new framework designed to improve event prediction in long videos. Unlike previous models that struggle with complex narratives and detailed analysis, VISTA extracts specific visual d…
RESEARCH · CL_62767 · May 29 · 09:18

New research probes VLM susceptibility to visual persuasion and influence

Researchers are developing new frameworks to evaluate the susceptibility of Vision-Language Models (VLMs) to multimodal persuasion and visual influences. One study introduces MMPersuade to test agent-to-agent persuasion…
RESEARCH · CL_62298 · May 29 · 08:21

AI models tackle template collapse and improve CT scan report generation

Researchers have developed two new AI models aimed at improving the accuracy and efficiency of generating reports from 3D CT scans. One model, CLarGen, addresses the issue of "Template Collapse" where AI models produce …
RESEARCH · CL_65075 · May 29 · 00:00

StressDream steers video world models toward high-impact outcomes

Researchers have developed StressDream, a novel method to improve the evaluation and enhancement of policies within video world models. This technique steers the imaginations of these models towards high-impact, plausib…
COMMENTARY · CL_56601 · May 28 · 07:34

Hugging Face exec details open-source multimodal AI at PyCon Italia

At PyCon Italia 2026, Merve Noyan of Hugging Face delivered the opening keynote discussing the advancements in open-source multimodal AI. The presentation covered a range of topics including vision-language models, AI a…
RESEARCH · CL_58644 · May 28 · 03:11

New frameworks advance multimodal retrieval for documents and images

Researchers have introduced several new frameworks and benchmarks for multimodal retrieval tasks. Dynamic Adapter Routing (DAR) addresses continual multimodal retrieval by using prototype-based routing for adapter selec…

New VLM-driven defense framework PRISM targets backdoor attacks

New research explores advanced compression techniques for AI models

New benchmark reveals VLM struggles with counterfactual video reasoning

Synthetic data boosts VLM performance, researchers find

VLMs improved for world modeling via inverse dynamics prediction

New method improves vision-language models' cross-modal similarity understanding

Robotic framework GSAM enhances articulated object manipulation

Vision-language models reconstruct 3D scenes as editable Blender programs

New benchmark tests AI's ability to code 3D models

New benchmark tests VLM understanding of Japanese charts and tables

New benchmark tests VLM robustness to physical visual stress

VLMs show hidden gender bias, suppressing female representations

New benchmark finds VLMs unreliable for visually impaired assistance

New FBHM benchmark reveals VLM weaknesses in hateful meme detection

New VISTA framework enhances long-video event prediction

New research probes VLM susceptibility to visual persuasion and influence

AI models tackle template collapse and improve CT scan report generation

StressDream steers video world models toward high-impact outcomes

Hugging Face exec details open-source multimodal AI at PyCon Italia

New frameworks advance multimodal retrieval for documents and images