ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Total · 30d

56 over 90d

Releases · 30d

0 over 90d

Papers · 30d

56 over 90d

TIER MIX · 90D

RELATIONSHIPS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/3 · 52 TOTAL

TOOL · CL_27571 · May 11 · 01:59

New benchmark EgoMemReason tests AI memory in week-long videos

Researchers have introduced EgoMemReason, a new benchmark designed to test the memory capabilities of multimodal large language models (MLLMs) and agentic frameworks in understanding long-horizon egocentric videos. The …
TOOL · CL_22498 · May 8 · 04:00

New metric evaluates MLLMs for logical consistency without annotations

Researchers have introduced a new metric, VL-LCM, to evaluate the logical consistency of multimodal large language models (MLLMs) without requiring ground-truth annotations. This metric assesses the cause-effect reasoni…
TOOL · CL_22405 · May 8 · 04:00

MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

Researchers have developed ContactPrompt, a novel training-free method for dense hand contact estimation that utilizes multi-modal large language models (MLLMs). This approach addresses challenges in encoding 3D hand ge…
TOOL · CL_22465 · May 8 · 04:00

New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

Researchers have identified a critical tradeoff in multimodal large language models (MLLMs) related to how harmful queries are concealed and reconstructed. They found that existing methods for transforming harmful input…
TOOL · CL_22437 · May 8 · 04:00

Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

Researchers have introduced Visual Para-Thinker, a novel framework for parallel reasoning in multimodal large language models (MLLMs). This approach shifts from vertical scaling of reasoning depth to a parallel strategy…
TOOL · CL_22420 · May 8 · 04:00

New SOW method uses MLLMs to improve image generation coherence

Researchers have introduced Selective One-Way Diffusion (SOW), a novel approach to image generation that reframes diffusion models for improved contextual coherence. SOW utilizes Multimodal Large Language Models (MLLMs)…
TOOL · CL_22492 · May 8 · 04:00

New benchmark evaluates MLLMs for cross-cultural knowledge insertion challenges

Researchers have introduced CrossCult-KIBench, a new benchmark designed to evaluate how well Multimodal Large Language Models (MLLMs) can adapt to different cultural contexts without negatively impacting their performan…
RESEARCH · CL_21787 · May 7 · 16:37

New MedHorizon benchmark tests AI's ability to understand long medical videos

Researchers have introduced MedHorizon, a new benchmark designed to test multimodal large language models (MLLMs) on understanding long-form medical videos. This benchmark includes 759 hours of clinical procedures and 1…
TOOL · CL_20778 · May 7 · 04:00

Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

Researchers have introduced Vision-EKIPL, a novel reinforcement learning framework designed to enhance visual reasoning in Multimodal Large Language Models (MLLMs). This approach incorporates high-quality actions genera…
TOOL · CL_18628 · May 6 · 04:00

New MSEarth benchmark uses MLLMs for Earth science discovery

Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…
RESEARCH · CL_18678 · May 5 · 14:18

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…
RESEARCH · CL_18700 · May 5 · 04:14

MLLMs show promise in analyzing seizure movements, outperforming traditional models

A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…
TOOL · CL_15615 · May 5 · 04:00

VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…
RESEARCH · CL_21948 · May 5 · 04:00

New AI unlearning methods balance data removal with model utility

Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …
TOOL · CL_15945 · May 5 · 04:00

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …
RESEARCH · CL_15728 · May 5 · 04:00

MLLMs show foundational visual gaps despite progress in multimodal reasoning

A new paper introduces a method to improve latent reasoning in multimodal large language models (MLLMs) by optimizing visual latents at inference time, addressing a pathology where their contribution is suppressed. Sepa…
TOOL · CL_15707 · May 5 · 04:00

Researchers use RL to improve MLLM regression on imbalanced data

Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
RESEARCH · CL_15670 · May 5 · 04:00

New HERMES and DSCache methods improve streaming video understanding with KV cache

Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …
RESEARCH · CL_15514 · May 4 · 14:14

New benchmark and models advance generalized moment retrieval in videos

Researchers have introduced Generalized Moment Retrieval (GMR), a new framework for video analysis that moves beyond the assumption of a single matching moment per query. This approach aims to retrieve all relevant temp…
RESEARCH · CL_14362 · May 4 · 04:00

GeoThinker framework actively integrates geometry for advanced spatial reasoning

Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…

New benchmark EgoMemReason tests AI memory in week-long videos

New metric evaluates MLLMs for logical consistency without annotations

MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

New SOW method uses MLLMs to improve image generation coherence

New benchmark evaluates MLLMs for cross-cultural knowledge insertion challenges

New MedHorizon benchmark tests AI's ability to understand long medical videos

Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

New MSEarth benchmark uses MLLMs for Earth science discovery

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

MLLMs show promise in analyzing seizure movements, outperforming traditional models

VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

New AI unlearning methods balance data removal with model utility

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

MLLMs show foundational visual gaps despite progress in multimodal reasoning

Researchers use RL to improve MLLM regression on imbalanced data

New HERMES and DSCache methods improve streaming video understanding with KV cache

New benchmark and models advance generalized moment retrieval in videos

GeoThinker framework actively integrates geometry for advanced spatial reasoning