Researchers have developed AesRM, a new family of reward models designed to improve the aesthetic quality of generated videos. This system breaks down video aesthetics into three dimensions: Visual Aesthetics, Visual Fidelity, and Visual Plausibility, with 15 specific criteria. AesRM utilizes expert feedback from a dataset of 2500 video pairs to train models that can predict preferences and generate interpretable reasoning. The models were trained through a three-stage process, including atomic aesthetic capability learning and reinforcement learning, and have shown improved performance and robustness compared to existing methods. Additionally, AesRM has been used to enhance the video generation model Wan2.2, resulting in noticeable aesthetic improvements. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Introduces a new framework and models for evaluating and improving video generation aesthetics, potentially impacting content creation tools.
RANK_REASON This is a research paper detailing a new model and benchmark for video aesthetics.