New RAG research tackles tabular data, cost, and cross-lingual knowledge

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 18 sources

Several recent research papers explore advancements in Retrieval-Augmented Generation (RAG) systems. One paper introduces Orthogonal Subspace Decomposition (OSD) to separate task-specific behavior from document knowledge in parametric RAG, improving adapter composition. Another paper, CroSearch-R1, proposes a framework to better leverage cross-lingual knowledge for RAG by integrating multilingual information into a reinforcement learning process. Additionally, research investigates the impact of coreference resolution on RAG, demonstrating its ability to reduce ambiguity and improve performance, particularly for smaller models. Other studies focus on enhancing RAG for specific domains like financial reports through reranking analysis and for knowledge graph question answering using semantic caching. AI

Summary written by gemini-2.5-flash-lite from 18 sources. How we write summaries →

IMPACT These papers collectively advance RAG techniques, potentially improving factual accuracy, cross-lingual capabilities, and explainability in LLM applications.

RANK_REASON This cluster consists of multiple arXiv preprints detailing novel research methodologies and datasets for improving RAG systems.

Read on arXiv cs.CL →

COVERAGE [18]

arXiv cs.CL TIER_1 · Pooja Guttal, Varun Magotra, Vasudeva Mahavishnu, Natasha Chanto, Sidharth Sivaprasad, Manas Gaur · 2026-05-04 04:00

Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

arXiv:2605.00318v1 Announce Type: new Abstract: Tabular documents such as CSV and Excel files are widely used in enterprise data pipelines, yet existing chunking strategies for retrieval-augmented generation (RAG) are primarily designed for unstructured text and do not account fo…
arXiv cs.LG TIER_1 · Shawqi Al-Maliki, Ammar Gharaibeh, Mohamed Rahouti, Mohammad Ruhul Amin, Mohamed Abdallah, Junaid Qadir, Ala Al-Fuqaha · 2026-05-01 04:00

Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

arXiv:2604.26981v1 Announce Type: cross Abstract: Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinate and fail to provide sources …
arXiv cs.CL TIER_1 · Manas Gaur · 2026-05-01 00:57

Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

Tabular documents such as CSV and Excel files are widely used in enterprise data pipelines, yet existing chunking strategies for retrieval-augmented generation (RAG) are primarily designed for unstructured text and do not account for tabular structure. We propose a structure-awar…
arXiv cs.CL TIER_1 · Koki Itai, Shunichi Hasegawa, Yuta Yamamoto, Gouki Minegishi, Masaki Otsuki · 2026-04-30 04:00

LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation

arXiv:2603.06198v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) is a framework in which a Generator, such as a Large Language Model (LLM), produces answers by retrieving documents from an external collection using a Retriever. In practice, Generators must…
arXiv cs.CL TIER_1 · Yushi Sun, Lei Chen · 2026-04-30 04:00

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

arXiv:2604.26176v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, g…
arXiv cs.CL TIER_1 · Weihang Su, Hanwen Zhang, Qingyao Ai, Yiqun Liu · 2026-04-30 04:00

Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation

arXiv:2604.26768v1 Announce Type: new Abstract: Parametric Retrieval-Augmented Generation (PRAG) encodes external documents into lightweight parameter modules that can be retrieved and merged at inference time, offering a promising alternative to in-context retrieval augmentation…
arXiv cs.CL TIER_1 · Yiqun Liu · 2026-04-29 15:00

Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation

Parametric Retrieval-Augmented Generation (PRAG) encodes external documents into lightweight parameter modules that can be retrieved and merged at inference time, offering a promising alternative to in-context retrieval augmentation. Despite its potential, many PRAG implementatio…
arXiv cs.CL TIER_1 · Zhiyuan Cheng, Longying Lai, Yue Liu, Kai Cheng, Xiaoxi Qi · 2026-04-29 04:00

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

arXiv:2603.16877v2 Announce Type: replace Abstract: Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about …
arXiv cs.CL TIER_1 · Rui Qi, Fengran Mo, Sijin Lu, Yufeng Chen, Jian-Yun Nie, Kaiyu Huang · 2026-04-29 04:00

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

arXiv:2604.25182v1 Announce Type: new Abstract: A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates …
arXiv cs.CL TIER_1 · Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park, Heuiseok Lim · 2026-04-29 04:00

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems

arXiv:2507.07847v3 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) has emerged as a crucial framework in natural language processing (NLP), improving factual consistency and reducing hallucinations by integrating external document retrieval with large langua…
arXiv cs.CL TIER_1 · Lei Chen · 2026-04-28 23:46

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exp…
arXiv cs.LG TIER_1 · Zhuoling Li, Ha Linh Hong Tran Nguyen, Valeria Bladinieres, Maxim Romanovsky · 2026-04-28 04:00

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

arXiv:2604.24623v1 Announce Type: cross Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) extends traditional RAG by using knowledge graphs (KGs) to give large language models (LLMs) a structured, semantically coherent context, yielding more grounded answers. Howeve…
arXiv cs.AI TIER_1 · Miao Xie, Xiao Zhang, Yi Li, Chunli Lv · 2026-04-28 04:00

Structure Guided Retrieval-Augmented Generation for Factual Queries

arXiv:2604.22843v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) has been proposed to mitigate hallucinations in large language models (LLMs), where generated outputs may be factually incorrect. However, existing RAG approaches predominantly rely on vector s…
arXiv cs.AI TIER_1 · Aryan Patodiya · 2026-04-28 04:00

StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems

arXiv:2604.22757v1 Announce Type: cross Abstract: We introduce StratRAG, an open-source retrieval evaluation dataset for benchmarking Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning tasks under realistic, noisy document-pool conditions. Derived from HotpotQA (…
arXiv cs.LG TIER_1 · Yuchen Yan, Peiyan Zhang, Zhihua Liu, Hao Wang, Yatao Bian, Weiming Li, Xiaoshuai Hao · 2026-04-28 04:00

Question-Adaptive Graph Learning for Multi-hop Retrieval Augmented Generation

arXiv:2510.11541v2 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) has demonstrated its ability to enhance Large Language Models (LLMs) by integrating external knowledge sources. However, multi-hop questions, which require the identification of multiple know…
arXiv cs.CL TIER_1 · Kaiyu Huang · 2026-04-28 03:41

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates multiple pieces of knowledge from different lang…
arXiv cs.AI TIER_1 · Maxim Romanovsky · 2026-04-27 15:52

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

Graph-based Retrieval-Augmented Generation (GraphRAG) extends traditional RAG by using knowledge graphs (KGs) to give large language models (LLMs) a structured, semantically coherent context, yielding more grounded answers. However, GraphRAG reasoning process remains a black-box,…
arXiv cs.CL TIER_1 · Lichang Song, Ting Long, Yi Chang · 2026-04-27 04:00

Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem

arXiv:2602.18734v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) has demonstrated strong effectiveness in knowledge-intensive tasks by grounding language generation in external evidence. Despite its success, many existing RAG systems are built based on a r…

COVERAGE [18]

RELATED ENTITIES

RELATED TOPICS