PulseAugur
LIVE 10:09:07
ENTITY Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond

Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond

PulseAugur coverage of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond — every cluster mentioning Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond across labs, papers, and developer communities, ranked by signal.

Total · 30d
0
0 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
0
0 over 90d
TIER MIX · 90D

No coverage in the last 90 days.

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 19 TOTAL
  1. TOOL · CL_30586 ·

    New GSEC framework uses LLMs for improved image clustering

    Researchers have developed a new image clustering framework called GSEC, which utilizes generative semantic guidance and a bi-layer ensemble strategy. This approach employs Multimodal Large Language Models to create sem…

  2. TOOL · CL_30596 ·

    New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

    Researchers have introduced CiteVQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to accurately attribute answers to specific source regions within documents. Unlike pre…

  3. TOOL · CL_30605 ·

    New benchmark reveals AI models lag human experts in judging image beauty

    Researchers have developed the Visual Aesthetic Benchmark (VAB) to evaluate how well multimodal large language models (MLLMs) can judge beauty in images. Their study found that current frontier MLLMs perform significant…

  4. TOOL · CL_29251 ·

    New benchmark reveals MLLMs struggle with spatial reasoning

    Researchers have introduced PCSR-Bench, a new diagnostic benchmark designed to evaluate the spatial reasoning capabilities of multimodal large language models (MLLMs) when processing omnidirectional images. The benchmar…

  5. TOOL · CL_29402 ·

    New benchmark tests multimodal LLMs on complex optimization tasks

    Researchers have introduced MM-OptBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on optimization modeling tasks. This benchmark incorporates both text and visual information, a depa…

  6. TOOL · CL_29435 ·

    New multimodal benchmark uses 900K Japanese student responses

    Researchers have developed a new multimodal benchmark using data from Japan's National Assessment of Academic Ability, which includes approximately 900,000 aggregated student responses. This dataset features real exam m…

  7. TOOL · CL_27553 ·

    New V-ABS framework enhances multimodal visual reasoning

    Researchers have developed V-ABS, a novel beam search framework designed to improve multi-step visual reasoning in multimodal large language models. This approach addresses the imagination-action-observer bias by iterat…

  8. TOOL · CL_25753 ·

    SphereVAD uses LLM features for training-free video anomaly detection

    Researchers have developed SphereVAD, a novel framework for video anomaly detection that operates without requiring any task-specific training. This method leverages the rich semantic information already present in the …

  9. RESEARCH · CL_22410 ·

    New benchmarks and models advance video understanding reward modeling

    Researchers have developed new methods for training reward models for video understanding tasks, addressing a gap in current AI capabilities. One approach introduces a benchmark called VURB and a dataset VUP-35K, leadin…

  10. TOOL · CL_26980 ·

    Pro$^2$Assist uses AR and LLMs for proactive procedural task guidance

    Researchers have developed Pro$^2$Assist, a new multimodal large language model system designed to offer continuous, step-aware proactive assistance for complex, long-horizon procedural tasks. Unlike previous assistants…

  11. TOOL · CL_15620 ·

    VoxAfford improves 3D affordance detection with multi-scale voxel-token fusion

    Researchers have developed VoxAfford, a novel method for open-vocabulary 3D affordance detection. This approach enhances multimodal large language models by integrating multi-scale geometric features from a 3D VQVAE enc…

  12. RESEARCH · CL_15594 ·

    New research tackles conflicting data in multimodal emotion recognition

    Researchers have developed new methods to improve multimodal emotion recognition, which combines text, audio, and vision data. One approach, Dual-Path Conflict Resolution (DCR), learns to either fuse conflicting modalit…

  13. RESEARCH · CL_11400 ·

    COHERENCE benchmark evaluates MLLMs' fine-grained image-text alignment in interleaved contexts

    Researchers have introduced COHERENCE, a new benchmark designed to assess the fine-grained image-text alignment capabilities of Multimodal Large Language Models (MLLMs). Existing benchmarks often overlook the complexiti…

  14. RESEARCH · CL_06542 ·

    Researchers develop new methods for knowledge graph retrieval and completion

    Researchers have developed new frameworks to enhance knowledge graph completion and visual question answering by integrating multimodal knowledge graphs with retrieval-augmented generation techniques. One approach, RADD…

  15. RESEARCH · CL_06531 ·

    OmniVTG dataset and CoT paradigm enhance open-world video temporal grounding

    Researchers have introduced OmniVTG, a large-scale dataset and training paradigm designed to improve open-world Video Temporal Grounding (VTG) for Multimodal Large Language Models (MLLMs). The dataset was created using …

  16. RESEARCH · CL_06275 ·

    OS-SPEAR toolkit evaluates AI agents for safety, performance, efficiency, and robustness

    Researchers have introduced OS-SPEAR, a new toolkit designed to rigorously evaluate operating system agents. This toolkit assesses agents across four key dimensions: safety, performance, efficiency, and robustness. OS-S…

  17. RESEARCH · CL_04921 ·

    MLLMs predict mouse social dominance in novel MTT-Bench benchmark

    Researchers have developed MTT-Bench, a new benchmark for analyzing mouse social dominance using Multimodal Large Language Models (MLLMs). This framework fine-tunes existing MLLM architectures to predict dominance hiera…

  18. RESEARCH · CL_05414 ·

    SAKE framework enhances multimodal NER with self-aware knowledge exploitation

    Researchers have developed SAKE, a new framework designed to improve Grounded Multimodal Named Entity Recognition (GMNER). SAKE addresses challenges in open-world environments, such as identifying long-tailed and evolvi…

  19. RESEARCH · CL_05425 ·

    Air-Know network tackles composed image retrieval with novel expert-proxy-diversion paradigm

    Researchers have introduced Air-Know, a novel network designed to tackle the Composed Image Retrieval (CIR) challenge, specifically addressing the Noisy Triplet Correspondence (NTC) problem. Existing methods struggle wi…