ENTITY Qwen2.5-VL

Qwen2.5-VL

PulseAugur coverage of Qwen2.5-VL — every cluster mentioning Qwen2.5-VL across labs, papers, and developer communities, ranked by signal.

Total · 30d

14

14 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

11

11 over 90d

TIER MIX · 90D

frontier release 1
research 6
tool 7

TOPICS

SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 1/1 · 14 TOTAL

TOOL · CL_79831 · Jun 9 · 04:00

New benchmark reveals multilingual safety gaps in vision-language models

Researchers have developed MLingualFC, a new multilingual benchmark to test the safety vulnerabilities of vision-language models (VLMs). This benchmark uses flowchart images encoded with harmful instructions in five lan…
TOOL · CL_66123 · Jun 2 · 04:00

New CoCoA method boosts multimodal embedding quality

Researchers have introduced CoCoA, a novel pre-training paradigm designed to enhance multimodal embedding models. This method focuses on content reconstruction through collaborative attention, aiming to create more comp…
RESEARCH · CL_66037 · Jun 2 · 04:00

New methods boost video QA by compressing content and improving temporal reasoning

Researchers have developed new methods to improve video question answering (VQA) for long videos. One approach, MemoryCard, compresses video content into topic-aware "Memory Cards" to better capture event-level semantic…
RESEARCH · CL_47640 · May 24 · 02:56

llama.cpp releases add Vulkan, optimize matrix math, and improve server logging

The llama.cpp project has released several updates, including version b9580 which adds Vulkan support for matrix-matrix multiplication and Flash Attention, along with optimizations for FP16 dot2 extensions. Other recent…
TOOL · CL_44756 · May 22 · 04:00

New framework boosts VLM anomaly detection for self-driving cars

Researchers have developed SAVANT, a new framework designed to improve the detection of semantic anomalies in autonomous driving systems using Vision-Language Models (VLMs). SAVANT reformulates anomaly detection as a la…
RESEARCH · CL_41802 · May 20 · 02:17

UF Gators win AmericasNLP 2026 task with novel captioning system

Researchers from the University of Florida Gators have won the AmericasNLP 2026 shared task for cultural image captioning of Indigenous languages. Their two-stage system uses Qwen2.5-VL for an intermediate Spanish capti…
FRONTIER RELEASE · CL_42261 · May 15 · 08:05

ByteDance releases Lance, a unified multimodal AI model

ByteDance has released Lance, an open-source multimodal AI model capable of understanding, generating, and editing both images and videos within a single framework. This lightweight model, with only 3 billion active par…
TOOL · CL_32566 · May 14 · 12:14

Video2GUI generates 12M GUI trajectories from unlabeled videos

Researchers have developed Video2GUI, an automated framework designed to generate large-scale interaction trajectories for training GUI agents. This system extracts data from unlabeled internet videos, converting them i…
TOOL · CL_22434 · May 8 · 04:00

New DICModel enhances ICT image captioning with multi-modal LLMs

Researchers have developed a novel Domain-specific Image Captioning Model (DICModel) designed for the ICT industry, utilizing a multi-stage progressive training strategy. This approach combines synthesized image-text pa…
TOOL · CL_22400 · May 8 · 04:00

Medical VLMs struggle with negated answers, new benchmark reveals

Researchers have developed CXR-ContraBench, a new benchmark designed to evaluate the performance of medical vision-language models (VLMs) in correctly interpreting negated statements within chest X-ray analyses. The ben…
RESEARCH · CL_09753 · Apr 29 · 11:51

DenseStep2M pipeline automates video annotation for improved understanding

Researchers have developed DenseStep2M, a novel pipeline that automatically extracts detailed procedural annotations from instructional videos without requiring training data. This system segments videos, filters irrele…
RESEARCH · CL_08185 · Apr 28 · 14:46

OcularChat MLLM accurately diagnoses age-related macular degeneration with interactive explanations

Researchers have developed OcularChat, a multimodal large language model (MLLM) fine-tuned from Qwen2.5-VL, designed to diagnose age-related macular degeneration (AMD) using color fundus photographs. The model was train…
TOOL · CL_47693 · May 5 · 00:00

Arcee AI moves to Together Endpoints for cost-efficient SLMs

Arcee AI has migrated its specialized small language models (SLMs) from AWS to Together Dedicated Endpoints, seeking improved cost, performance, and operational agility. The company focuses on training efficient models …
RESEARCH · CL_04681 · Nov 5 · 00:00

New research tackles LLM hallucinations with novel methods and benchmarks

Multiple research papers released on arXiv address the challenge of hallucinations in large language and vision-language models. One paper introduces In-Context Visual Contrastive Optimization (IC-VCO) to mitigate multi…