Researchers have developed DoGMaTiQ, a new pipeline designed to automatically generate question-and-answer (QA) nuggets for evaluating long-form reports, particularly those generated by retrieval-augmented generation (RAG) systems. This process addresses the significant challenge of manually curating these evaluation nuggets, which is especially difficult in cross-lingual contexts. The DoGMaTiQ system operates in three stages: generating document-grounded nuggets, clustering paraphrases, and subselecting nuggets based on quality criteria. Experiments on TREC shared tasks demonstrated that DoGMaTiQ produces QA nuggets that correlate well with human judgments, and its effectiveness is largely dependent on the quality of the large language model used for nugget generation. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Automates the creation of evaluation datasets for RAG systems, potentially accelerating research and development in report generation.
RANK_REASON This is a research paper detailing a new method for generating evaluation artifacts for AI systems.