PulseAugur
LIVE 08:13:10
research · [3 sources] ·
0
research

New benchmarks SciMDR and ShredBench evaluate multimodal LLMs on scientific documents and reconstruction

Researchers have introduced ShredBench, a new benchmark designed to evaluate the semantic reasoning abilities of multimodal large language models (MLLMs) in reconstructing documents from shredded fragments. This benchmark utilizes an automated pipeline to generate fragmented documents, ensuring that evaluations are not contaminated by training data. Initial tests on current MLLMs reveal a significant drop in performance as document fragmentation increases, indicating a gap in their ability to bridge visual discontinuities and perform fine-grained cross-modal reasoning. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Highlights limitations in current MLLMs for document reconstruction from fragmented sources, suggesting areas for future research.

RANK_REASON Introduction of a new benchmark for evaluating MLLMs on a specific task.

Read on arXiv cs.CL →

COVERAGE [3]

  1. arXiv cs.CL TIER_1 · Ziyu Chen, Yilun Zhao, Chengye Wang, Rilyn Han, Manasi Patwardhan, Arman Cohan ·

    SciMDR: Advancing Scientific Multimodal Document Reasoning

    arXiv:2603.12249v2 Announce Type: replace Abstract: Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-regro…

  2. arXiv cs.CL TIER_1 · Wenping Ma ·

    ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

    Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We consider content restoration from shredded fragments, a…

  3. arXiv cs.CV TIER_1 · Zichun Guo, Yuling Shi, Wenhao Zeng, Chao Hu, Haotian Lin, Terry Yue Zhuo, Jiawei Chen, Xiaodong Gu, Wenping Ma ·

    ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

    arXiv:2604.23813v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We conside…