Researchers have introduced SpecPL, a novel approach to prompt learning for Vision-Language Models (VLMs) that addresses modality asymmetry by focusing on spectral granularity. This method decomposes visual signals into low-frequency semantic bands and high-frequency detail bands, using a frozen VAE and a Visual Semantic Bank to anchor text representations. Through counterfactual granule training, SpecPL compels models to distinguish visual granularity from semantic invariance, leading to improved fine-grained discrimination. Experiments on 11 benchmarks show SpecPL achieving a new performance ceiling of 81.51% harmonic-mean accuracy and revitalizing existing text-oriented baselines. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Introduces a new technique for improving VLM performance by addressing spectral granularity in visual data, potentially enhancing fine-grained discrimination.
RANK_REASON This is a research paper detailing a new method for prompt learning in VLMs.