EzAudio creates high quality sound effects

Researchers at Johns Hopkins University and Tencent AI Lab have developed a new text-to-audio model called EzAudio. As Michael Nuñez reports for VentureBeat, EzAudio can generate high-quality sound effects from text descriptions. The model uses an innovative method for processing audio data and a new architecture called EzAudio-DiT. In tests, EzAudio outperformed existing open-source models …

Read more

EVI 2 features significantly improved voice interface

Hume AI has unveiled an improved version of its voice AI EVI 2. The new version offers more natural conversations, faster response times, and more voice customization options. This was reported by Carl Franzen for VentureBeat. According to Hume co-founder Alan Cowen, EVI 2 can now be integrated directly into apps to process user requests. …

Read more

Google’s Audio Overview explains complex concepts

Google is adding an audio feature to its AI-powered note-taking app NotebookLM. The new “Audio Overview” allows users to get verbal explanations of complex topics from uploaded documents, as Aisha Malik reports for TechCrunch. AI-generated virtual hosts summarize the content and explain difficult concepts using metaphors. The feature is aimed at people who absorb information …

Read more

Music AI Suno available on iPhones

The music AI Suno is now available for iPhones in the USA. It offers a variety of styles and genres, and can create full songs with lyrics and vocals, as well as instrumentals. The app will soon be available in other countries and for Android devices.

ElevenLabs AI Voice Isolator introduced

ElevenLabs has introduced a new free service called AI Voice Isolator, which removes unwanted background noise from movies, podcasts or YouTube videos. Unlike other programs that can only remove constant noise, the Voice Isolator also handles irregular noises such as a door opening or someone clapping.

ElevenLabs Reader reads any text aloud for you

ElevenLabs has released a new app called Reader, which allows users to have any text read aloud in AI voices. New are “Iconic Voices”, which recreate the voices of deceased stars such as Judy Garland, James Dean and Laurence Olivier. The company acquired the rights to the voices from CMG Worldwide and stresses that the …

Read more

Resemble Detect-2B helps to recognize audio deepfakes

Resemble AI has introduced Detect-2B, a new audio deepfake detection model that claims to have 94% accuracy. The model looks for subtle artifacts to determine whether speech is real or artificially generated.

DeepMind V2A automatically generates audio for videos

Google’s AI research lab DeepMind has developed a new technology called V2A that can automatically generate appropriate soundtracks, sound effects, and even dialogue for videos. While V2A seems promising, DeepMind admits that the quality of the audio generated is not yet perfect. For now, it is not generally available.

Meta releases several new AI models

Meta is releasing a series of new AI models for audio, text and watermarks. Meta is also making two sizes of its Chameleon multimodal text model available for research. These models can be used to perform tasks that require both visual and textual understanding, such as image annotation.