Researchers have developed Video2GUI, an automated framework designed to generate large-scale interaction trajectories for training GUI agents. This system extracts data from unlabeled internet videos, converting them into structured agent trajectories through a filtering process. The resulting dataset, WildGUI, contains 12 million trajectories across over 1,500 applications, significantly improving the pre-training of models like Qwen2.5-VL and Mimo-VL. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Enables creation of large-scale datasets for GUI agents, potentially improving their generalization and performance across diverse applications.
RANK_REASON Academic paper introducing a new method and dataset for GUI agent pretraining. [lever_c_demoted from research: ic=1 ai=1.0]