PulseAugur
LIVE 06:50:13
research · [2 sources] ·
0
research

LLMs hallucinate in academic writing; Grok, Copilot better at references, Gemini, ChatGPT at tone

A new paper investigates the hallucination tendencies of four large language models—ChatGPT, Grok, Gemini, and Copilot—when used for academic writing. Researchers designed 80 prompts across four categories and introduced a Hallucination Index (HI) to measure factual accuracy and reference validity. The study found that Grok and Copilot excelled at reference generation but struggled with abstract tasks, while Gemini and ChatGPT showed better tone control but higher hallucination risks in factual writing. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Highlights the persistent challenge of LLM factual accuracy in specialized domains like academic writing, suggesting prompt engineering and task-specific tuning are crucial.

RANK_REASON The cluster contains an academic paper detailing research findings on LLM hallucinations.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Humam Khan, Md Tabrez Nafis, Shahab Saquib Sohail, Aqeel Khalique, Rehan Hasan Khan ·

    Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

    arXiv:2605.04171v1 Announce Type: new Abstract: Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and C…

  2. arXiv cs.CL TIER_1 · Rehan Hasan Khan ·

    Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

    Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for acade…