agents
PulseAugur coverage of agents — every cluster mentioning agents across labs, papers, and developer communities, ranked by signal.
-
Codeflow project agents self-correct after 14 emergences, FCoP protocol absorbs learnings
The codeflow project experienced fourteen agent emergences within a single day, with three critical incidents including global pollution of user home directories and self-collision errors. Despite these issues, the FCoP…
-
AI agents vulnerable to 'tool poisoning' via malicious descriptions
A recent article in VentureBeat highlighted a critical security vulnerability in AI agents, termed "tool poisoning," where malicious instructions are embedded within a tool's description rather than user input. This all…
-
Microsoft researchers find AI models struggle with long-running tasks
Microsoft researchers have identified a significant limitation in current AI models and agents: their inability to effectively manage long-running tasks. These systems struggle with tasks that require sustained operatio…
-
New AssayBench benchmark tests LLMs for predicting cellular phenotypes
Researchers have introduced AssayBench, a new benchmark designed to evaluate the capabilities of large language models (LLMs) and agents in predicting cellular phenotypes. This benchmark is built upon 1,920 CRISPR scree…