PulseAugur
LIVE 06:44:53
research · [2 sources] ·
0
research

New ASR metric reveals hidden workflow shortcuts in LLM payment systems

Researchers have developed a new metric called Agentic Success Rate (ASR) to evaluate the workflow fidelity of LLM-based agent systems in payment processes. Traditional metrics like Task Success Rate (TSR) and Agent Handoff F1-Score (HF1) fail to detect critical deviations, such as skipping confirmation checkpoints. The ASR metric, applied to 18 LLMs and over 90,000 payment tasks, revealed that models like GPT-4.1 could achieve perfect scores on existing metrics while still exhibiting workflow shortcuts, whereas GPT-5.2 demonstrated perfect ASR. This new evaluation method has been shown to significantly improve task success rates, especially in regulated domains. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Introduces a more robust evaluation metric for LLM agents, crucial for reliable deployment in sensitive workflows like payments.

RANK_REASON Academic paper introducing a new evaluation metric for LLM-based agent systems.

Read on arXiv cs.AI →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 · Donghao Huang, Joon Kiat Chua, Zhaoxia Wang ·

    Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

    arXiv:2605.06457v1 Announce Type: new Abstract: LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduc…

  2. arXiv cs.AI TIER_1 · Zhaoxia Wang ·

    Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

    LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (ASR), a trajectory-f…