A recent ARC Prize evaluation tested Anthropic's Claude Opus 4.7 and OpenAI's GPT 5.5 on the ARC-AGI-3 benchmark. The results revealed unexpected outcomes, though not in the most obvious ways. The specific nature of these surprises was not detailed in the provided information. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Benchmark results for Claude Opus 4.7 and GPT 5.5 on ARC-AGI-3 reveal unexpected performance characteristics.
RANK_REASON The cluster reports on benchmark test results for AI models on a specific academic benchmark.