Researchers have developed ATBench-Claw and ATBench-Codex, extensions to the ATBench framework for evaluating agent trajectory safety. These benchmarks are tailored for the OpenClaw and OpenAI Codex environments, respectively. The customization process involves analyzing each setting, adapting a three-dimensional safety taxonomy, and using this to define the benchmark specification. This approach allows for robust safety evaluation as agent systems evolve in their execution settings and tool ecosystems. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides new tools for evaluating and diagnosing safety issues in agent trajectories across different execution environments.
RANK_REASON The cluster contains an academic paper detailing new benchmarks for AI safety evaluation.