ENTITY multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

37 over 90d

Releases · 30d

0 over 90d

Papers · 30d

37 over 90d

TIER MIX · 90D

TOPICS

paper 37
model release 17
product 10
safety 6
other 6

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%

SENTIMENT · 30D

13 day(s) with sentiment data

RECENT · PAGE 1/2 · 37 TOTAL

TOOL · CL_80232 · Jun 9 · 04:00

HDRAgent uses LLMs for adaptive HDR imaging

Researchers have introduced HDRAgent, a novel framework for High Dynamic Range (HDR) imaging that utilizes an agent-driven approach to adaptively select reconstruction strategies. This method aims to mitigate ghosting a…
TOOL · CL_79965 · Jun 9 · 04:00

New SMART framework enhances video moment retrieval with audio and shot-aware compression

Researchers have developed SMART, a new framework for video moment retrieval that enhances multimodal understanding by integrating audio cues with visual information. This approach utilizes a Multimodal Large Language M…
RESEARCH · CL_79121 · Jun 7 · 00:00

New benchmark CoVEBench tests complex video editing AI

Researchers have introduced CoVEBench, a new benchmark designed to evaluate the capabilities of text-guided video editing models. This benchmark addresses the limitations of existing models that struggle with complex, m…
RESEARCH · CL_76917 · Jun 5 · 11:40

New benchmark tackles privacy blind spots in AI image editing

Researchers have introduced SPPE, a new benchmark for evaluating privacy-preserving image editing in Multimodal Large Language Models (MLLMs). This benchmark addresses the issue where standard privacy methods often resu…
RESEARCH · CL_68200 · Jun 2 · 06:29

New benchmark WebRISE tests MLLM-generated web artifacts

Researchers have developed WebRISE, a new benchmark for evaluating Multi-modal Large Language Models (MLLMs) that generate web artifacts. Unlike previous methods, WebRISE focuses on requirement-induced states and transi…
TOOL · CL_65434 · Jun 2 · 04:00

New benchmark dataset and detection framework tackle AI-generated video forgery

Researchers have introduced CoCoVideo-26K, a new benchmark dataset designed to improve the detection of AI-generated videos, particularly those created by high-fidelity commercial models. The dataset features semantical…
RESEARCH · CL_66260 · Jun 1 · 17:27

ToolFG framework uses MLLMs and tools for image classification

Researchers have introduced ToolFG, a novel framework designed for fine-grained image classification that integrates multimodal large language models (MLLMs) with external tools. This approach allows MLLMs to autonomous…
RESEARCH · CL_65852 · May 31 · 00:00

New benchmarks test robot manipulation models for trustworthiness

Researchers have developed new benchmarks to evaluate the trustworthiness of video world models used in robotic manipulation. These benchmarks assess models across normal, constraint-sensitive, counterfactual, and adver…
RESEARCH · CL_63070 · May 29 · 12:01

Language models enhance deepfake detector generalization and interpretability

Researchers have developed a novel method for training deepfake detectors by leveraging multimodal large language models (MLLMs). This approach uses language as a regularization mechanism to improve both the generalizab…
RESEARCH · CL_62968 · May 29 · 11:41

New agentic framework uses MLLM to improve object detection

Researchers have introduced DetAS, an agentic framework for object detection that treats the task as a dynamic decision process. This framework utilizes a Multimodal Large Language Model (MLLM) to adaptively compose det…
RESEARCH · CL_41904 · May 20 · 08:31

FruitEnsemble uses MLLM to boost fruit classification accuracy

Researchers have developed FruitEnsemble, a novel framework for fine-grained fruit classification that addresses challenges like limited datasets and visual similarity between fruit types. The system utilizes a two-stag…
RESEARCH · CL_41910 · May 20 · 07:14

OSGNet and MLLM win Ego4D Episodic Memory Challenge

Researchers have developed a novel approach for the Ego4D Episodic Memory Challenge, achieving first place in both the Natural Language Queries and GoalStep tracks. Their method combines the OSGNet localization model wi…
RESEARCH · CL_43941 · May 16 · 16:15

New architectures enable real-time video understanding

Researchers are developing new methods for real-time video understanding, moving beyond traditional offline analysis. Several papers propose architectures that decouple visual perception from language generation to impr…
TOOL · CL_36043 · May 15 · 14:56

EndoGSim uses MLLMs for physics-aware surgical simulation

Researchers have developed EndoGSim, a new framework for simulating dynamic endoscopic scenes in robot-assisted surgery. This system uses Multi-modal Large Language Models (MLLMs) to guide Gaussian Splatting, enabling p…
TOOL · CL_30750 · May 13 · 13:42

New MLLM framework unifies surgical scene understanding

Researchers have developed SurgMLLM, a novel framework that unifies surgical scene understanding by integrating high-level reasoning with low-level visual grounding. This multimodal large language model (MLLM) is fine-t…
TOOL · CL_29245 · May 12 · 17:59

AlphaGRPO framework boosts multimodal AI generation with self-reflection

Researchers have introduced AlphaGRPO, a new framework designed to improve multimodal generation in Unified Multimodal Models (UMMs). This approach uses Group Relative Policy Optimization (GRPO) to enable models to perf…
TOOL · CL_27987 · May 11 · 16:00

New MPerS method uses MLLMs for remote sensing scene segmentation

Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images usi…
TOOL · CL_25593 · May 8 · 09:53

New MLLM WeatherSyn generates weather reports, outperforms existing models

Researchers have introduced WeatherSyn, a novel instruction-tuned multimodal large language model (MLLM) designed for generating weather forecast reports. This model is trained on a new dataset, , which includes data f…
TOOL · CL_22442 · May 8 · 04:00

Motion-MLLM enhances 3D scene understanding with egomotion data

Researchers have developed Motion-MLLM, a new framework that integrates egomotion data from Inertial Measurement Units (IMUs) with video to enhance Multimodal Large Language Models (MLLMs) for 3D scene understanding. Th…
RESEARCH · CL_22410 · May 8 · 04:00

New benchmarks and models advance video understanding reward modeling

Researchers have developed new methods for training reward models for video understanding tasks, addressing a gap in current AI capabilities. One approach introduces a benchmark called VURB and a dataset VUP-35K, leadin…

HDRAgent uses LLMs for adaptive HDR imaging

New SMART framework enhances video moment retrieval with audio and shot-aware compression

New benchmark CoVEBench tests complex video editing AI

New benchmark tackles privacy blind spots in AI image editing

New benchmark WebRISE tests MLLM-generated web artifacts

New benchmark dataset and detection framework tackle AI-generated video forgery

ToolFG framework uses MLLMs and tools for image classification

New benchmarks test robot manipulation models for trustworthiness

Language models enhance deepfake detector generalization and interpretability

New agentic framework uses MLLM to improve object detection

FruitEnsemble uses MLLM to boost fruit classification accuracy

OSGNet and MLLM win Ego4D Episodic Memory Challenge

New architectures enable real-time video understanding

EndoGSim uses MLLMs for physics-aware surgical simulation

New MLLM framework unifies surgical scene understanding

AlphaGRPO framework boosts multimodal AI generation with self-reflection

New MPerS method uses MLLMs for remote sensing scene segmentation

New MLLM WeatherSyn generates weather reports, outperforms existing models

Motion-MLLM enhances 3D scene understanding with egomotion data

New benchmarks and models advance video understanding reward modeling