A Dive into Vision-Language Models
Hugging Face has released a suite of resources and models focused on advancing vision-language models (VLMs). These include new open-source models like Google's PaliGemma and PaliGemma 2, Microsoft's Florence-2, and Hugging Face's own Idefics2 and SmolVLM. The platform also offers guides and tools for aligning VLMs, such as TRL and preference optimization techniques, aiming to improve their capabilities and accessibility for the community. AI
IMPACT Expands the ecosystem of open-source vision-language models and provides tools for their alignment and fine-tuning.