A new chart visualizes the performance history of major AI models, tracking their capabilities over time rather than just their latest release. This tool aims to expose hidden trends like performance degradation or "nerfs" that can occur after a model's initial launch. The data is sourced daily from the LMSYS Arena Leaderboard, which uses crowdsourced human evaluations to provide a robust measure of model performance. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a tool for operators to track model degradation and understand performance nuances beyond initial release benchmarks.
RANK_REASON The cluster describes a new visualization tool for tracking AI model performance over time, based on existing benchmark data. [lever_c_demoted from research: ic=1 ai=1.0]