PulseAugur
LIVE 06:13:29
ENTITY Whisper

Whisper

PulseAugur coverage of Whisper — every cluster mentioning Whisper across labs, papers, and developer communities, ranked by signal.

Total · 30d
32
32 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
16
16 over 90d
TIER MIX · 90D
TIMELINE
  1. 2026-05-12 research_milestone A new semi-supervised framework for speech confidence detection was proposed, achieving a Macro-F1 score of 0.751. source
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/2 · 21 TOTAL
  1. TOOL · CL_30789 ·

    New benchmark and training method boost Indic language speech recognition

    Researchers have introduced Vividh-ASR, a new benchmark designed to evaluate and improve automatic speech recognition (ASR) for Indic languages like Hindi and Malayalam. This benchmark addresses the 'studio-bias' phenom…

  2. TOOL · CL_29601 ·

    CognitiveBotics builds personalized AI content engine for autistic children

    CognitiveBotics has developed a personalized content engine for children with autism, addressing the challenge of high individual variability in learning preferences. Their Modalities Engine renders learning objectives …

  3. TOOL · CL_29444 ·

    New framework improves speech confidence detection using Whisper

    Researchers have developed a new semi-supervised framework for detecting speaker confidence in speech, addressing the challenge of limited labeled data. This approach combines deep semantic embeddings from OpenAI's Whis…

  4. TOOL · CL_26552 ·

    Developer releases llmclean library to clean LLM output

    A developer has released version 0.2.0 of llmclean, a Python library designed to clean and normalize output from large language models. The library addresses common issues such as removing markdown fences, repairing mal…

  5. TOOL · CL_26361 ·

    MCP ecosystem adds database tooling, sees major platform integrations dominate

    The MCP ecosystem is expanding with new database tooling integrations, including Local-YDB for managing local Yandex Distributed SQL database instances within AI workflows. Major platforms like GitHub, OpenAI, and Figma…

  6. RESEARCH · CL_25987 ·

    AI interpretability advances with Sparse Autoencoders for ASR and functional operators

    Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…

  7. TOOL · CL_27585 ·

    LLMs show mixed reliability for mental health screening

    A new research paper investigates the reliability of large language models (LLMs) for mental health screening, specifically their ability to estimate anxiety and depression scores from speech. The study evaluated three …

  8. TOOL · CL_22903 ·

    Hermes AI adds free, local voice control for Telegram and Discord

    A guide details how to implement voice control for the Hermes AI assistant, enabling users to interact with it via spoken commands on platforms like Telegram and Discord. The system utilizes local, free models for speec…

  9. TOOL · CL_22854 ·

    Speech-to-Markdown tool structures spoken thoughts into structured documents

    A developer has created a Speech-to-Markdown tool called stmd, integrated into the TaskSquad application, to address the challenge of structuring thoughts spoken aloud. The tool uses local Whisper models for transcripti…

  10. TOOL · CL_21319 ·

    Whisper fine-tuning pipeline built for Indian languages

    This article details the process of building a dataset pipeline for fine-tuning OpenAI's Whisper model to better understand Indian languages. It focuses on the technical steps involved in preparing and processing audio …

  11. TOOL · CL_19104 ·

    Hugging Face adds private datasets to ASR leaderboard to prevent benchmaxxing

    Hugging Face has enhanced its Open ASR Leaderboard by incorporating new, high-quality English Automatic Speech Recognition datasets from Appen Inc. and DataoceanAI. To prevent "benchmaxxing" or test-set contamination, t…

  12. RESEARCH · CL_17939 ·

    Mistral AI and X-Voice advance multilingual voice cloning with new architectures

    Researchers have introduced X-Voice, a compact 0.4B parameter model capable of zero-shot cross-lingual voice cloning in 30 languages. The model utilizes a two-stage training process with a unified International Phonetic…

  13. TOOL · CL_15989 ·

    BaldWhisper model achieves 48% size reduction and 2.15x speedup

    Researchers have developed BaldWhisper, a method to significantly compress and accelerate the Whisper speech-to-text model. By employing low-rank decomposition for embeddings and merging transformer layers, BaldWhisper …

  14. RESEARCH · CL_14473 ·

    Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

    Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate th…

  15. RESEARCH · CL_08610 ·

    Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis

    Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…

  16. RESEARCH · CL_08266 ·

    WhisperPipe architecture slashes ASR latency and memory use for real-time applications

    Researchers have developed WhisperPipe, a new streaming architecture designed to improve real-time automatic speech recognition (ASR) performance. This architecture addresses the trade-off between accuracy and computati…

  17. RESEARCH · CL_06729 ·

    New FADE method improves ASR model quantization for edge devices

    Researchers have developed FADE, a novel framework for improving post-training quantization of encoder-decoder Automatic Speech Recognition (ASR) models. This method addresses the issue of error accumulation across laye…

  18. RESEARCH · CL_13934 ·

    Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge

    A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…

  19. TOOL · CL_00804 ·

    Speak leverages OpenAI's AI for personalized language learning and global expansion

    Speak, a language learning application, is leveraging OpenAI's advanced AI capabilities to create a personalized and highly interactive tutoring experience. The company, which began in 2016, has evolved significantly wi…

  20. TOOL · CL_02402 ·

    Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services

    Morgan Stanley has partnered with OpenAI to integrate GPT-4 into its financial advisory services, enhancing advisor efficiency and client engagement. The firm developed an internal chatbot, AI @ Morgan Stanley Assistant…