Self-Critique Loops for Agents: Where the 3rd Iteration Stops Helping
A recent paper from Anthropic explores how large language models, specifically Claude Sonnet 4.5, develop internal representations of emotion concepts. These representations allow the models to generalize and track operative emotions within conversations, potentially explaining why LLMs sometimes appear to exhibit emotional reactions. The research suggests these behaviors stem from training that encourages human-like characteristics and the development of abstract concept representations. AI
IMPACT Explains the emergence of 'emotional' responses in LLMs, potentially impacting alignment research and user interaction.