Google expands NotebookLM with interactive AI features and enterprise version

Google has announced significant updates to its AI-powered note-taking application NotebookLM, including a new interactive feature for its Audio Overviews function and an enterprise-focused version called NotebookLM Plus. The application, which has gained popularity for its ability to generate podcast-like conversations between AI hosts based on source materials, now allows users to directly interact with … Read more

ElevenLabs introduces AI podcast creation and editing system

ElevenLabs has launched a new AI-powered tool that enables users to create and edit podcasts from text documents and other source materials. As reported by Ashley Carman for Bloomberg, the system can generate conversational podcasts in 32 languages using AI-voiced hosts selected from thousands of voice samples. The New York-based startup, valued at $1.1 billion … Read more

Hume AI releases voice customization tool for developers

Hume AI has launched Voice Control, a new feature that enables developers to create custom AI voices by adjusting vocal characteristics through an interface with sliding controls. As reported by Carl Franzen for VentureBeat, the tool allows users to modify voices along ten different dimensions including assertiveness, confidence, and enthusiasm without requiring coding skills. The … Read more

Nvidia unveils AI audio generation model Fugatto

Nvidia has introduced a new AI model called Fugatto that can generate and modify audio, including music, voice, and sound effects. As reported by Stephen Nellis for Reuters, the technology allows users to transform existing sounds, change voice accents, and create novel audio effects through text prompts. The model, whose name stands for Foundational Generative … Read more

Voice cloning startup PlayAI raises $21M amid safety concerns

PlayAI, a company developing AI-powered voice cloning and text-to-speech technology, has secured $21 million in seed funding led by 500 Startups and Kindred Ventures. As reported by Kyle Wiggers for TechCrunch, the Y Combinator-backed startup offers tools for creating synthetic voices, including a voice cloning feature and automated customer service agents. While the technology allows … Read more

New AI model combines speech recognition with privacy protection

Israeli startup aiOla has released Whisper-NER, an open-source AI model that transcribes audio while automatically masking sensitive information. As reported by Carl Franzen for VentureBeat, the model builds upon OpenAI’s Whisper framework and combines automatic speech recognition with named entity recognition to protect private data during transcription. The tool can identify and obscure sensitive details … Read more

YouTube tests AI feature to restyle songs for Shorts

YouTube is launching a limited test of an AI-powered feature that allows creators to modify licensed songs for their Shorts videos, The Verge reports. The new capability, an extension of YouTube’s Dream Track feature, enables selected creators to generate 30-second soundtracks by altering elements like mood and genre of existing songs through text prompts. The … Read more

OpenAI expands Realtime API with new voices and reduces costs for developers

OpenAI has updated its Realtime API, currently in beta, with five new expressive voices for speech-to-speech applications and reduced costs for developers by introducing prompt caching. According to OpenAI’s API documentation cited in an article by VentureBeat, the native speech-to-speech feature enables low latency and nuanced output. The company showcased three of the new voices … Read more

Amphion: open-source toolkit for audio, music and speech generation

Amphion is an open-source toolkit designed to support research and development in audio, music and speech generation. According to the project’s GitHub site, it offers unique visualizations of classic models and architectures to help junior researchers and engineers better understand them. The toolkit supports various individual generation tasks such as text-to-speech (TTS), singing voice synthesis … Read more

Speech to text: Moonshine is fast and as accurate as OpenAI’s Whisper

Useful, an AI company focused on improving human-machine communication, has open-sourced Moonshine, a new speech-to-text model that aims to significantly reduce the latency of voice interfaces. According to Useful founder Pete Warden, Moonshine returns results 1.7 times faster than OpenAI’s Whisper model while matching or exceeding its accuracy. The model’s variable-length input window allows it … Read more