A new research paper introduces DeGenTWeb, a system designed to systematically identify websites dominated by content generated by large language models (LLMs) with minimal human oversight. The study found that LLM-dominant websites are surprisingly prevalent across the web, appearing frequently in both Common Crawl data and Bing search results, and their proportion is increasing. The research also highlights the difficulty in accurately detecting LLM-generated content, as current detection methods perform worse than advertised when trying to minimize false attributions. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights the growing prevalence of LLM-generated content online and the challenges in detection, impacting content moderation and search.
RANK_REASON Academic paper introducing a new methodology and findings about LLM-generated content on the web.