Microsoft has released VibeVoice, an open-source speech-to-text model with built-in speaker diarization. The MIT-licensed model is available for local deployment, meaning audio data does not need to be sent to an API. One user tested the model on a MacBook Pro, transcribing an hour of audio in under nine minutes, though it required significant RAM. AI
Summary written by gemini-2.5-flash-lite from 6 sources. How we write summaries →
IMPACT Provides a self-hostable, open-source alternative for speech-to-text transcription, potentially reducing operational costs for developers.
RANK_REASON Open-source model release from a major company, but not a frontier model release from a top-tier AI lab.