Researchers from Penn State University and Duke University, alongside collaborators from institutions including Google DeepMind, have introduced a new research problem called "Automated Failure Attribution" for LLM Multi-Agent systems. They developed the first benchmark dataset, "Who&When," and several methods to automatically identify which agent caused a task failure and at what point. This work aims to streamline the debugging process for complex multi-agent systems, which is currently a time-consuming manual effort, and improve their overall reliability. The paper has been accepted as a Spotlight presentation at ICML 2025, with the code and dataset now open-source. AI
Summary written by None from 2 sources. How we write summaries →
RANK_REASON Research paper introducing a new problem and dataset for LLM multi-agent systems.