PulseAugur
LIVE 01:38:54
ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Total · 30d
56
56 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
56
56 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/3 · 52 TOTAL
  1. TOOL · CL_27571 ·

    New benchmark EgoMemReason tests AI memory in week-long videos

    Researchers have introduced EgoMemReason, a new benchmark designed to test the memory capabilities of multimodal large language models (MLLMs) and agentic frameworks in understanding long-horizon egocentric videos. The …

  2. TOOL · CL_22498 ·

    New metric evaluates MLLMs for logical consistency without annotations

    Researchers have introduced a new metric, VL-LCM, to evaluate the logical consistency of multimodal large language models (MLLMs) without requiring ground-truth annotations. This metric assesses the cause-effect reasoni…

  3. TOOL · CL_22405 ·

    MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

    Researchers have developed ContactPrompt, a novel training-free method for dense hand contact estimation that utilizes multi-modal large language models (MLLMs). This approach addresses challenges in encoding 3D hand ge…

  4. TOOL · CL_22465 ·

    New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

    Researchers have identified a critical tradeoff in multimodal large language models (MLLMs) related to how harmful queries are concealed and reconstructed. They found that existing methods for transforming harmful input…

  5. TOOL · CL_22437 ·

    Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

    Researchers have introduced Visual Para-Thinker, a novel framework for parallel reasoning in multimodal large language models (MLLMs). This approach shifts from vertical scaling of reasoning depth to a parallel strategy…

  6. TOOL · CL_22420 ·

    New SOW method uses MLLMs to improve image generation coherence

    Researchers have introduced Selective One-Way Diffusion (SOW), a novel approach to image generation that reframes diffusion models for improved contextual coherence. SOW utilizes Multimodal Large Language Models (MLLMs)…

  7. TOOL · CL_22492 ·

    New benchmark evaluates MLLMs for cross-cultural knowledge insertion challenges

    Researchers have introduced CrossCult-KIBench, a new benchmark designed to evaluate how well Multimodal Large Language Models (MLLMs) can adapt to different cultural contexts without negatively impacting their performan…

  8. RESEARCH · CL_21787 ·

    New MedHorizon benchmark tests AI's ability to understand long medical videos

    Researchers have introduced MedHorizon, a new benchmark designed to test multimodal large language models (MLLMs) on understanding long-form medical videos. This benchmark includes 759 hours of clinical procedures and 1…

  9. TOOL · CL_20778 ·

    Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

    Researchers have introduced Vision-EKIPL, a novel reinforcement learning framework designed to enhance visual reasoning in Multimodal Large Language Models (MLLMs). This approach incorporates high-quality actions genera…

  10. TOOL · CL_18628 ·

    New MSEarth benchmark uses MLLMs for Earth science discovery

    Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…

  11. RESEARCH · CL_18678 ·

    New VQA methods enhance explainability and knowledge integration for multimodal LLMs

    Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…

  12. RESEARCH · CL_18700 ·

    MLLMs show promise in analyzing seizure movements, outperforming traditional models

    A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…

  13. TOOL · CL_15615 ·

    VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

    Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…

  14. RESEARCH · CL_21948 ·

    New AI unlearning methods balance data removal with model utility

    Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …

  15. TOOL · CL_15945 ·

    New In-Prompt Process Supervision framework enhances MLLMs for video moderation

    Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …

  16. RESEARCH · CL_15728 ·

    MLLMs show foundational visual gaps despite progress in multimodal reasoning

    A new paper introduces a method to improve latent reasoning in multimodal large language models (MLLMs) by optimizing visual latents at inference time, addressing a pathology where their contribution is suppressed. Sepa…

  17. TOOL · CL_15707 ·

    Researchers use RL to improve MLLM regression on imbalanced data

    Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…

  18. RESEARCH · CL_15670 ·

    New HERMES and DSCache methods improve streaming video understanding with KV cache

    Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …

  19. RESEARCH · CL_15514 ·

    New benchmark and models advance generalized moment retrieval in videos

    Researchers have introduced Generalized Moment Retrieval (GMR), a new framework for video analysis that moves beyond the assumption of a single matching moment per query. This approach aims to retrieve all relevant temp…

  20. RESEARCH · CL_14362 ·

    GeoThinker framework actively integrates geometry for advanced spatial reasoning

    Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…