Qwen2-VL
PulseAugur coverage of Qwen2-VL — every cluster mentioning Qwen2-VL across labs, papers, and developer communities, ranked by signal.
-
GPT-4o and other multimodal models evaluated on computer vision tasks
A new paper evaluates how well multimodal foundation models, including GPT-4o and Gemini 1.5 Pro, perform on standard computer vision tasks. Researchers developed a prompt-chaining method to translate vision tasks into …
-
FAIR_XAI framework reveals bias in multimodal models for wellbeing assessment
Researchers have developed FAIR_XAI, a framework to improve the fairness of multimodal foundation models used in wellbeing assessment. The study evaluated Phi3.5-Vision and Qwen2-VL on datasets like E-DAIC and AFAR-BSFT…
-
VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
Researchers have introduced VG-CoT, a new dataset designed to improve the trustworthiness of Large Vision-Language Models (LVLMs). This dataset automatically links reasoning steps to specific visual evidence within imag…