ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

96 over 90d

Releases · 30d

0 over 90d

Papers · 30d

96 over 90d

TIER MIX · 90D

TOPICS

paper 96
other 35
model release 31
safety 18
product 12
infra 4

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 90%
used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by train of thought 70%
used by Standard Chinese 70%
used by Chain Of Thought 70%

TIMELINE

2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source

SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 2/5 · 96 TOTAL

TOOL · CL_51560 · May 26 · 04:00

New EgoProx benchmark tests MLLMs on 3D spatial reasoning

Researchers have introduced EgoProx, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason about 3D proximity from an egocentric perspective. The benchmark orga…
TOOL · CL_51213 · May 26 · 04:00

New benchmark tests AI agents' active spatial reasoning

Researchers have introduced ESI-BENCH, a new benchmark designed to evaluate embodied spatial intelligence in AI agents. This benchmark focuses on the perception-action loop, where agents actively explore their environme…
TOOL · CL_51102 · May 26 · 04:00

New GVG framework uses AI to generate images from EEG data

Researchers have developed a new framework called Generative Visual Grounding (GVG) to improve the understanding of electroencephalogram (EEG) data using multimodal large language models (MLLMs). GVG addresses the scarc…
TOOL · CL_49280 · May 22 · 07:29

New framework AKT-Rec improves e-commerce recommendations using LLM-generated IDs

Researchers have developed a new framework called AKT-Rec to address challenges in long-tail recommendation systems, particularly those in e-commerce platforms with significant data imbalance. This framework utilizes mu…
TOOL · CL_45094 · May 22 · 04:00

SkeletonLLM enables LLMs to process human skeleton data

Researchers have developed SkeletonLLM, a novel approach to enable multimodal large language models (MLLMs) to understand structured, non-visual data like human skeletons. The system uses DrAction, a differentiable rend…
TOOL · CL_45081 · May 22 · 04:00

New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses

Researchers have introduced BEAR, a new benchmark designed to evaluate and diagnose the skill-level capabilities of embodied multimodal large language models (MLLMs). This benchmark decomposes embodied tasks into 14 dis…
TOOL · CL_45070 · May 22 · 04:00

New ST-SimDiff framework boosts MLLM video processing efficiency

Researchers have developed ST-SimDiff, a novel framework designed to make multimodal large language models (MLLMs) more efficient at processing long videos. The method addresses the computational burden by focusing on b…
RESEARCH · CL_45045 · May 22 · 04:00

New methods and benchmarks boost MLLM visual grounding

Researchers have developed new methods to improve visual grounding in multimodal large language models (MLLMs). One approach, PGT, uses procedurally generated tasks with geometric primitives to provide denser supervisio…
TOOL · CL_45035 · May 22 · 04:00

MLLMs struggle with video timing; new method recovers temporal grounding

Researchers have identified a temporal grounding issue in multimodal large language models (MLLMs) where the models understand event timing during an initial phase but lose this signal during answer generation. They dis…
TOOL · CL_44979 · May 22 · 04:00

New MapTab benchmark tests multimodal LLMs on complex route planning

Researchers have introduced MapTab, a new benchmark designed to evaluate the multi-criteria reasoning abilities of multimodal large language models (MLLMs). This benchmark utilizes route planning tasks that combine visu…
TOOL · CL_44952 · May 22 · 04:00

New pipeline enhances LLMs for safety-critical driving analysis

Researchers have developed a new pipeline to improve the ability of multimodal large language models (MLLMs) to analyze safety-critical driving events. This pipeline fuses downsampled video frames with telematics data a…
RESEARCH · CL_43971 · May 21 · 15:57

AI-generated Chinese poetry detected using image-semantic method

Researchers have developed a novel method for detecting AI-generated modern Chinese poetry by integrating image semantics with text analysis. This approach uses images related to the poem's content to provide complement…
TOOL · CL_43934 · May 21 · 15:51

New benchmark evaluates human and LLM text-to-image prompting skills

Researchers have introduced AtelierEval, a novel benchmark designed to evaluate the proficiency of both humans and multimodal large language models (MLLMs) in generating effective text-to-image prompts. This benchmark, …
RESEARCH · CL_45069 · May 21 · 00:00

MLLMs show prejudice gap in personality assessments, new benchmark reveals

Researchers have introduced a new benchmark and dataset called MM-OCEAN to evaluate how well multimodal large language models (MLLMs) can reason about personality. The study found that a significant portion of MLLMs, ov…
RESEARCH · CL_44007 · May 21 · 00:00

LatentOmni framework unifies audio-visual reasoning for omnimodal understanding

Researchers have introduced LatentOmni, a novel framework designed to enhance omnimodal understanding by unifying audio-visual reasoning within a latent space. This approach aims to overcome limitations in current multi…
TOOL · CL_41890 · May 20 · 12:22

TextSculptor framework advances scene text editing with new dataset and benchmark

Researchers have introduced TextSculptor, a new framework designed to improve scene text editing in images. This framework includes an automated data construction pipeline that generates a large dataset of 3.2 million s…
RESEARCH · CL_41749 · May 20 · 06:14

New methods tackle AI hallucinations in research and medical Q&A

Two new research papers address the critical issue of AI hallucinations in different domains. One paper introduces ACL-Verbatim, an extractive question-answering system designed to provide hallucination-free answers fro…
RESEARCH · CL_44092 · May 20 · 00:00

New methods boost video diffusion model efficiency and quality

Researchers are developing new methods to improve the efficiency and quality of video diffusion models. Several papers introduce techniques to optimize attention mechanisms, such as sparse attention (LVSA, Veda) and lin…
TOOL · CL_46843 · May 19 · 09:02

New benchmark EgoCoT-Bench tests MLLM reasoning in egocentric video

Researchers have introduced EgoCoT-Bench, a new benchmark designed to evaluate the reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing egocentric video data. This benchmark specifically fo…
RESEARCH · CL_38223 · May 18 · 17:59

New ESI-Bench benchmark tests AI agents' active spatial reasoning

Researchers have introduced ESI-Bench, a new benchmark designed to evaluate embodied spatial intelligence in AI agents. This benchmark focuses on the perception-action loop, where agents actively explore their environme…

New EgoProx benchmark tests MLLMs on 3D spatial reasoning

New benchmark tests AI agents' active spatial reasoning

New GVG framework uses AI to generate images from EEG data

New framework AKT-Rec improves e-commerce recommendations using LLM-generated IDs

SkeletonLLM enables LLMs to process human skeleton data

New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses

New ST-SimDiff framework boosts MLLM video processing efficiency

New methods and benchmarks boost MLLM visual grounding

MLLMs struggle with video timing; new method recovers temporal grounding

New MapTab benchmark tests multimodal LLMs on complex route planning

New pipeline enhances LLMs for safety-critical driving analysis

AI-generated Chinese poetry detected using image-semantic method

New benchmark evaluates human and LLM text-to-image prompting skills

MLLMs show prejudice gap in personality assessments, new benchmark reveals

LatentOmni framework unifies audio-visual reasoning for omnimodal understanding

TextSculptor framework advances scene text editing with new dataset and benchmark

New methods tackle AI hallucinations in research and medical Q&A

New methods boost video diffusion model efficiency and quality

New benchmark EgoCoT-Bench tests MLLM reasoning in egocentric video

New ESI-Bench benchmark tests AI agents' active spatial reasoning