Researchers have introduced COCOTree, a new dataset and benchmark designed for the task of open tree-structured visual decomposition. This task involves segmenting images into hierarchical trees of visual components with flexible granularity. The dataset was generated using a novel pipeline that combines Large Vision-Language Models with SAM 3 for semantic reasoning and geometric grounding, resulting in over 2.1K images and 1.8M structural nodes with an open vocabulary of 3.5K labels. A new evaluation metric, Open Tree Quality (OTQ), has also been proposed to assess mask precision, label accuracy, and structural consistency. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Enables new research in hierarchical image segmentation and visual decomposition tasks.
RANK_REASON The cluster describes a new dataset and benchmark for a novel computer vision task, including a proposed evaluation metric and details on its generation methodology. [lever_c_demoted from research: ic=1 ai=1.0]