A new benchmark called CI-Work has been developed to assess the contextual integrity of enterprise LLM agents, focusing on their ability to handle sensitive information. Evaluations of current leading models show significant privacy failures, with violation rates between 15.8% and 50.9%. The research highlights a trade-off where improved task utility often leads to increased privacy risks, suggesting that current scaling approaches are insufficient for secure enterprise deployment. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights critical privacy risks in enterprise LLM agents, necessitating new context-aware architectures for secure deployment.
RANK_REASON Academic paper introducing a new benchmark for LLM agents.