PulseAugur
EN
LIVE 22:37:52
ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
96
96 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
96
96 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source
SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 2/5 · 96 TOTAL
  1. TOOL · CL_51560 ·

    New EgoProx benchmark tests MLLMs on 3D spatial reasoning

    Researchers have introduced EgoProx, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason about 3D proximity from an egocentric perspective. The benchmark orga…

  2. TOOL · CL_51213 ·

    New benchmark tests AI agents' active spatial reasoning

    Researchers have introduced ESI-BENCH, a new benchmark designed to evaluate embodied spatial intelligence in AI agents. This benchmark focuses on the perception-action loop, where agents actively explore their environme…

  3. TOOL · CL_51102 ·

    New GVG framework uses AI to generate images from EEG data

    Researchers have developed a new framework called Generative Visual Grounding (GVG) to improve the understanding of electroencephalogram (EEG) data using multimodal large language models (MLLMs). GVG addresses the scarc…

  4. TOOL · CL_49280 ·

    New framework AKT-Rec improves e-commerce recommendations using LLM-generated IDs

    Researchers have developed a new framework called AKT-Rec to address challenges in long-tail recommendation systems, particularly those in e-commerce platforms with significant data imbalance. This framework utilizes mu…

  5. TOOL · CL_45094 ·

    SkeletonLLM enables LLMs to process human skeleton data

    Researchers have developed SkeletonLLM, a novel approach to enable multimodal large language models (MLLMs) to understand structured, non-visual data like human skeletons. The system uses DrAction, a differentiable rend…

  6. TOOL · CL_45081 ·

    New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses

    Researchers have introduced BEAR, a new benchmark designed to evaluate and diagnose the skill-level capabilities of embodied multimodal large language models (MLLMs). This benchmark decomposes embodied tasks into 14 dis…

  7. TOOL · CL_45070 ·

    New ST-SimDiff framework boosts MLLM video processing efficiency

    Researchers have developed ST-SimDiff, a novel framework designed to make multimodal large language models (MLLMs) more efficient at processing long videos. The method addresses the computational burden by focusing on b…

  8. RESEARCH · CL_45045 ·

    New methods and benchmarks boost MLLM visual grounding

    Researchers have developed new methods to improve visual grounding in multimodal large language models (MLLMs). One approach, PGT, uses procedurally generated tasks with geometric primitives to provide denser supervisio…

  9. TOOL · CL_45035 ·

    MLLMs struggle with video timing; new method recovers temporal grounding

    Researchers have identified a temporal grounding issue in multimodal large language models (MLLMs) where the models understand event timing during an initial phase but lose this signal during answer generation. They dis…

  10. TOOL · CL_44979 ·

    New MapTab benchmark tests multimodal LLMs on complex route planning

    Researchers have introduced MapTab, a new benchmark designed to evaluate the multi-criteria reasoning abilities of multimodal large language models (MLLMs). This benchmark utilizes route planning tasks that combine visu…

  11. TOOL · CL_44952 ·

    New pipeline enhances LLMs for safety-critical driving analysis

    Researchers have developed a new pipeline to improve the ability of multimodal large language models (MLLMs) to analyze safety-critical driving events. This pipeline fuses downsampled video frames with telematics data a…

  12. RESEARCH · CL_43971 ·

    AI-generated Chinese poetry detected using image-semantic method

    Researchers have developed a novel method for detecting AI-generated modern Chinese poetry by integrating image semantics with text analysis. This approach uses images related to the poem's content to provide complement…

  13. TOOL · CL_43934 ·

    New benchmark evaluates human and LLM text-to-image prompting skills

    Researchers have introduced AtelierEval, a novel benchmark designed to evaluate the proficiency of both humans and multimodal large language models (MLLMs) in generating effective text-to-image prompts. This benchmark, …

  14. RESEARCH · CL_45069 ·

    MLLMs show prejudice gap in personality assessments, new benchmark reveals

    Researchers have introduced a new benchmark and dataset called MM-OCEAN to evaluate how well multimodal large language models (MLLMs) can reason about personality. The study found that a significant portion of MLLMs, ov…

  15. RESEARCH · CL_44007 ·

    LatentOmni framework unifies audio-visual reasoning for omnimodal understanding

    Researchers have introduced LatentOmni, a novel framework designed to enhance omnimodal understanding by unifying audio-visual reasoning within a latent space. This approach aims to overcome limitations in current multi…

  16. TOOL · CL_41890 ·

    TextSculptor framework advances scene text editing with new dataset and benchmark

    Researchers have introduced TextSculptor, a new framework designed to improve scene text editing in images. This framework includes an automated data construction pipeline that generates a large dataset of 3.2 million s…

  17. RESEARCH · CL_41749 ·

    New methods tackle AI hallucinations in research and medical Q&A

    Two new research papers address the critical issue of AI hallucinations in different domains. One paper introduces ACL-Verbatim, an extractive question-answering system designed to provide hallucination-free answers fro…

  18. RESEARCH · CL_44092 ·

    New methods boost video diffusion model efficiency and quality

    Researchers are developing new methods to improve the efficiency and quality of video diffusion models. Several papers introduce techniques to optimize attention mechanisms, such as sparse attention (LVSA, Veda) and lin…

  19. TOOL · CL_46843 ·

    New benchmark EgoCoT-Bench tests MLLM reasoning in egocentric video

    Researchers have introduced EgoCoT-Bench, a new benchmark designed to evaluate the reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing egocentric video data. This benchmark specifically fo…

  20. RESEARCH · CL_38223 ·

    New ESI-Bench benchmark tests AI agents' active spatial reasoning

    Researchers have introduced ESI-Bench, a new benchmark designed to evaluate embodied spatial intelligence in AI agents. This benchmark focuses on the perception-action loop, where agents actively explore their environme…