Humanity's Last Exam
PulseAugur coverage of Humanity's Last Exam — every cluster mentioning Humanity's Last Exam across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
LLMs learn to actively seek external info for better task adaptation
Researchers have developed a new method for adapting large language models (LLMs) by enabling them to actively seek information from external sources like Wikipedia and web browsers. This approach, termed "active inform…
-
OpenSearch-VL offers open recipe for advanced multimodal search agents
Researchers have developed OpenSearch-VL, a novel, fully open-source recipe for training advanced multimodal deep search agents. This approach utilizes a curated pipeline for high-quality training data, a diverse tool e…
-
New RSE strategy recycles LLM search experience for efficient test-time scaling
Researchers have introduced Recycling Search Experience (RSE), a novel method to improve the efficiency of test-time scaling for large language models. RSE transforms test-time search from isolated trials into a cumulat…
-
Xiaomi's MiMo-v2.5-Pro open-source model rivals top AI coding assistants
Xiaomi has released MiMo-v2.5-Pro, an open-source coding-focused language model that demonstrates impressive capabilities in complex tasks. The model successfully completed a university-level compiler project in hours, …
-
MTRouter cuts LLM costs by 58% on ScienceWorld, 43% on HLE
Researchers have developed MTRouter, a novel system designed to optimize the cost of multi-turn interactions with large language models. By jointly embedding interaction history and candidate models, MTRouter learns to …
-
Google Gemini API adds Deep Research updates with MCP and chart generation
Google has released two significant updates to its Gemini API, enhancing its Deep Research capabilities. These updates introduce improved quality, support for MCP, and native generation of charts and infographics. The G…
-
new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5
Google DeepMind has released Gemini 3 Deep Think V2, a new reasoning mode for Google AI Ultra subscribers and available via API early access. This model achieves new state-of-the-art results on benchmarks like ARC-AGI-2…
-
Kimi K2 model boasts 1T parameters and SOTA HLE, while Soumith Chintala departs PyTorch
Kimi K2, a new model from Kimi, boasts 1 trillion parameters and achieves state-of-the-art results on the HLE benchmark. It also demonstrates capabilities in BrowseComp and TauBench. Separately, Soumith Chintala has dep…
-
Google DeepMind launches Deep Think for Gemini Ultra subscribers
Google DeepMind has released a new AI capability called Deep Think, now available to Google AI Ultra subscribers via the Gemini app. This feature utilizes parallel thinking techniques, allowing the model to explore mult…