PulseAugur
EN
LIVE 20:01:24
ENTITY multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
37
37 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
37
37 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

13 day(s) with sentiment data

RECENT · PAGE 1/2 · 37 TOTAL
  1. TOOL · CL_80232 ·

    HDRAgent uses LLMs for adaptive HDR imaging

    Researchers have introduced HDRAgent, a novel framework for High Dynamic Range (HDR) imaging that utilizes an agent-driven approach to adaptively select reconstruction strategies. This method aims to mitigate ghosting a…

  2. TOOL · CL_79965 ·

    New SMART framework enhances video moment retrieval with audio and shot-aware compression

    Researchers have developed SMART, a new framework for video moment retrieval that enhances multimodal understanding by integrating audio cues with visual information. This approach utilizes a Multimodal Large Language M…

  3. RESEARCH · CL_79121 ·

    New benchmark CoVEBench tests complex video editing AI

    Researchers have introduced CoVEBench, a new benchmark designed to evaluate the capabilities of text-guided video editing models. This benchmark addresses the limitations of existing models that struggle with complex, m…

  4. RESEARCH · CL_76917 ·

    New benchmark tackles privacy blind spots in AI image editing

    Researchers have introduced SPPE, a new benchmark for evaluating privacy-preserving image editing in Multimodal Large Language Models (MLLMs). This benchmark addresses the issue where standard privacy methods often resu…

  5. RESEARCH · CL_68200 ·

    New benchmark WebRISE tests MLLM-generated web artifacts

    Researchers have developed WebRISE, a new benchmark for evaluating Multi-modal Large Language Models (MLLMs) that generate web artifacts. Unlike previous methods, WebRISE focuses on requirement-induced states and transi…

  6. TOOL · CL_65434 ·

    New benchmark dataset and detection framework tackle AI-generated video forgery

    Researchers have introduced CoCoVideo-26K, a new benchmark dataset designed to improve the detection of AI-generated videos, particularly those created by high-fidelity commercial models. The dataset features semantical…

  7. RESEARCH · CL_66260 ·

    ToolFG framework uses MLLMs and tools for image classification

    Researchers have introduced ToolFG, a novel framework designed for fine-grained image classification that integrates multimodal large language models (MLLMs) with external tools. This approach allows MLLMs to autonomous…

  8. RESEARCH · CL_65852 ·

    New benchmarks test robot manipulation models for trustworthiness

    Researchers have developed new benchmarks to evaluate the trustworthiness of video world models used in robotic manipulation. These benchmarks assess models across normal, constraint-sensitive, counterfactual, and adver…

  9. RESEARCH · CL_63070 ·

    Language models enhance deepfake detector generalization and interpretability

    Researchers have developed a novel method for training deepfake detectors by leveraging multimodal large language models (MLLMs). This approach uses language as a regularization mechanism to improve both the generalizab…

  10. RESEARCH · CL_62968 ·

    New agentic framework uses MLLM to improve object detection

    Researchers have introduced DetAS, an agentic framework for object detection that treats the task as a dynamic decision process. This framework utilizes a Multimodal Large Language Model (MLLM) to adaptively compose det…

  11. RESEARCH · CL_41904 ·

    FruitEnsemble uses MLLM to boost fruit classification accuracy

    Researchers have developed FruitEnsemble, a novel framework for fine-grained fruit classification that addresses challenges like limited datasets and visual similarity between fruit types. The system utilizes a two-stag…

  12. RESEARCH · CL_41910 ·

    OSGNet and MLLM win Ego4D Episodic Memory Challenge

    Researchers have developed a novel approach for the Ego4D Episodic Memory Challenge, achieving first place in both the Natural Language Queries and GoalStep tracks. Their method combines the OSGNet localization model wi…

  13. RESEARCH · CL_43941 ·

    New architectures enable real-time video understanding

    Researchers are developing new methods for real-time video understanding, moving beyond traditional offline analysis. Several papers propose architectures that decouple visual perception from language generation to impr…

  14. TOOL · CL_36043 ·

    EndoGSim uses MLLMs for physics-aware surgical simulation

    Researchers have developed EndoGSim, a new framework for simulating dynamic endoscopic scenes in robot-assisted surgery. This system uses Multi-modal Large Language Models (MLLMs) to guide Gaussian Splatting, enabling p…

  15. TOOL · CL_30750 ·

    New MLLM framework unifies surgical scene understanding

    Researchers have developed SurgMLLM, a novel framework that unifies surgical scene understanding by integrating high-level reasoning with low-level visual grounding. This multimodal large language model (MLLM) is fine-t…

  16. TOOL · CL_29245 ·

    AlphaGRPO framework boosts multimodal AI generation with self-reflection

    Researchers have introduced AlphaGRPO, a new framework designed to improve multimodal generation in Unified Multimodal Models (UMMs). This approach uses Group Relative Policy Optimization (GRPO) to enable models to perf…

  17. TOOL · CL_27987 ·

    New MPerS method uses MLLMs for remote sensing scene segmentation

    Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images usi…

  18. TOOL · CL_25593 ·

    New MLLM WeatherSyn generates weather reports, outperforms existing models

    Researchers have introduced WeatherSyn, a novel instruction-tuned multimodal large language model (MLLM) designed for generating weather forecast reports. This model is trained on a new dataset, , which includes data f…

  19. TOOL · CL_22442 ·

    Motion-MLLM enhances 3D scene understanding with egomotion data

    Researchers have developed Motion-MLLM, a new framework that integrates egomotion data from Inertial Measurement Units (IMUs) with video to enhance Multimodal Large Language Models (MLLMs) for 3D scene understanding. Th…

  20. RESEARCH · CL_22410 ·

    New benchmarks and models advance video understanding reward modeling

    Researchers have developed new methods for training reward models for video understanding tasks, addressing a gap in current AI capabilities. One approach introduces a benchmark called VURB and a dataset VUP-35K, leadin…