TriviaQA
PulseAugur coverage of TriviaQA — every cluster mentioning TriviaQA across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
PersonalAI 2.0 framework boosts LLM knowledge graph retrieval
Researchers have developed PersonalAI 2.0 (PAI-2), a new framework that enhances LLM systems by integrating external knowledge graphs. PAI-2 employs a dynamic, multi-stage query processing pipeline for adaptive, iterati…
-
New method quantifies LLM uncertainty using semantic entropy and conformal calibration
Researchers have developed a new method called Adaptive Conformal Semantic Entropy (ACSE) to better estimate the uncertainty of Large Language Models (LLMs). This approach focuses on the semantic dispersion of different…
-
New methods like SMF and SAM reduce catastrophic forgetting in LLMs
Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily a…
-
Researchers release Faithfulness-QA dataset to train context-faithful RAG models
Researchers have developed Faithfulness-QA, a new dataset containing nearly 100,000 samples designed to train Retrieval-Augmented Generation (RAG) models to prioritize retrieved context over their internal knowledge. Th…
-
S2G-RAG improves multi-hop QA by judging evidence sufficiency and gaps
Researchers have introduced S2G-RAG, a novel iterative framework designed to improve retrieval-augmented generation (RAG) for multi-hop question answering. The system features a controller, S2G-Judge, which determines i…
-
Gemma 3 4B LLM confidence training shows mixed results, improves accuracy post-hoc
A study on the Gemma 3 4B model investigated methods to improve its verbal confidence in responses. Initial attempts using a filtered dataset for confidence-conditioned supervised fine-tuning (CSFT) yielded negative res…
-
S2G-RAG framework improves multi-hop QA by judging evidence sufficiency
Researchers have introduced S2G-RAG, an iterative framework designed to improve retrieval-augmented question answering, particularly for multi-hop queries. The system features a controller called S2G-Judge that determin…
-
LLMs use internal confidence signals to detect and correct errors
Researchers have investigated how large language models can identify and correct their own mistakes without external input, drawing parallels to second-order confidence models in decision neuroscience. Their findings su…
-
Study finds 3-9B LLMs fail verbal confidence tests, impacting uncertainty estimates
A new study examined the verbal confidence of seven instruction-tuned, open-weight large language models (LLMs) with 3-9 billion parameters. Researchers found that these models failed to meet minimal validity criteria f…