Mistral Voxtral is the company’s first open-source AI audio model

French AI company Mistral has released Voxtral, its first family of open-source AI models for audio processing. The company positions Voxtral as a solution for developers who previously had to choose between less reliable open-source systems and expensive, closed proprietary models. Mistral claims Voxtral offers high performance at “less than half the price” of comparable …

Read more

Apple’s new speech technology beats OpenAI’s Whisper in transcription speed tests

Apple has introduced new speech recognition technology that significantly outperforms existing transcription tools in processing speed. The company unveiled SpeechAnalyzer and SpeechTranscriber as part of its developer beta releases at WWDC. John Voorhees from MacStories tested the new Apple framework against popular transcription apps built on OpenAI’s Whisper model. His tests used a 34-minute, 7GB …

Read more

ElevenLabs’ upgraded voice AI mimics natural conversation flow

ElevenLabs has released Conversational AI 2.0, an enhanced platform for building enterprise voice assistants that better simulate human dialogue patterns. The update addresses common issues like awkward pauses and interruptions in automated conversations, according to reporting by Carl Franzen. The new system analyzes conversational cues such as hesitations and filler words to determine when to …

Read more

Hume releases EVI 3 voice AI model with custom voice creation

New York startup Hume has launched EVI 3, an advanced conversational AI model that lets users create custom synthetic voices through voice-to-voice interaction. The technology targets applications from customer support to virtual companionship, according to reporting by Carl Franzen for VentureBeat. Users can specify personality traits, vocal qualities, and emotional tones to generate voices ranging …

Read more

Nvidia releases free Parakeet-TDT-0.6B-V2 speech recognition model

Nvidia has launched a new open-source automatic speech recognition (ASR) model called Parakeet-TDT-0.6B-v2. According to VentureBeat reporter Carl Franzen, the model can transcribe 60 minutes of audio in just one second when running on Nvidia’s GPU hardware. The new model currently tops the Hugging Face Open ASR Leaderboard with a word error rate of only …

Read more

Dia debuts as open-source text-to-speech model with natural dialogue capabilities

A startup called Nari Labs has released Dia, a new open-source text-to-speech model designed to produce naturalistic dialogue. According to VentureBeat reporter Carl Franzen, the 1.6 billion parameter model rivals offerings from ElevenLabs, OpenAI, and Google’s NotebookLM. Co-creator Toby Kim developed Dia “with zero funding” and Google’s support through access to TPU chips. The model …

Read more

Groq and PlayAI launch Dialog, a faster and more natural text-to-speech system

Groq and PlayAI have partnered to create Dialog, a text-to-speech system that delivers more natural-sounding voice AI. According to Michael Nuñez of VentureBeat, the system combines PlayAI’s voice technology with Groq’s high-speed inference platform. Dialog features an “adaptive speech contextualizer” that maintains awareness of conversation flow, allowing responses with appropriate tone and emotion. The system …

Read more

OpenAI launches improved AI models for voice and transcription

OpenAI has introduced three new AI models designed to enhance speech-to-text and text-to-speech capabilities. The models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts offer improved accuracy and customization options for developers building voice applications. According to OpenAI, the new transcription models significantly outperform their predecessor, Whisper, particularly in noisy environments and with various accents. The company’s internal benchmarks …

Read more

AI voice cloning tools lack effective safeguards against misuse

Most AI voice cloning services have inadequate protections against nonconsensual voice impersonation, according to a Consumer Reports investigation. The study examined six leading publicly available tools and found that five had easily bypassed safeguards. As reported by NBC News, four services (ElevenLabs, Speechify, PlayHT, and Lovo) merely require checking a box confirming authorization, while Resemble …

Read more

ElevenLabs launches Scribe with record 96.7% accuracy for English speech-to-text

ElevenLabs has released Scribe v1, a new speech-to-text model achieving record accuracy rates across 99 languages. According to Carl Franzen of VentureBeat, the model outperforms competitors from Google, OpenAI, and Deepgram with a 96.7% accuracy rate for English. Scribe can distinguish up to 32 different speakers in a single audio file and detect non-verbal elements …

Read more