EzAudio creates high quality sound effects

Researchers at Johns Hopkins University and Tencent AI Lab have developed a new text-to-audio model called EzAudio. As Michael Nuñez reports for VentureBeat, EzAudio can generate high-quality sound effects from text descriptions. The model uses an innovative method for processing audio data and a new architecture called EzAudio-DiT. In tests, EzAudio outperformed existing open-source models …

Read more

Google’s DataGemma specializes in statistics

Google is introducing two new AI models called DataGemma, which are designed to answer statistical questions more accurately. The models, based on the Gemma family, use data from Google’s Data Commons platform. As Shubham Sharma reports in an article for VentureBeat, the models use two different approaches: Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation …

Read more

Transfusion enables combined text and image models

A new method called Transfusion enables the training of models that can process and generate both text and images. As researchers from Meta and other institutions report, Transfusion combines prediction of the next token for text with diffusion for images in a single transformational model. Experiments have shown that this approach scales better than quantizing …

Read more

Benchmarks for AI agents flawed study reveals

A new research report from Princeton University reveals weaknesses in current benchmarks and evaluation practices for AI agents. The researchers argue that cost control is often neglected in evaluation, even though the resource costs of AI agents can be significantly higher than those of individual model queries. This leads to biased results, as expensive agents …

Read more

DeepMind JEST speeds up AI training

Google’s DeepMind researchers have developed a new method called JEST that significantly speeds up AI training while reducing energy requirements. By optimizing the selection of training data, JEST can reduce the number of iterations by a factor of 13 and the computational complexity by a factor of 10.

Microsoft MInference increases the speed of LLMs

Microsoft’s new “MInference” technology promises to significantly increase the processing speed of large language models by reducing the preprocessing time of long texts by up to 90%. An interactive demo on Hugging Face allows developers to test the technology and explore its capabilities.

DeepMind V2A automatically generates audio for videos

Google’s AI research lab DeepMind has developed a new technology called V2A that can automatically generate appropriate soundtracks, sound effects, and even dialogue for videos. While V2A seems promising, DeepMind admits that the quality of the audio generated is not yet perfect. For now, it is not generally available.

Researchers claim dramatic improvements to energy efficiency

Researchers have found a way to dramatically improve the energy efficiency of large language models without sacrificing performance. Using their system, a language model with billions of parameters can be run on as little as 13 watts. The researchers have also developed proprietary hardware that further maximizes energy savings.

New sources of better AI training data

Large Language Models (LLMs) are no longer trained solely on data from the Internet. In the past, LLMs were based on the vast data pool of the Internet, but this approach has reached its limits. To advance LLMs, companies like OpenAI are turning to new types of data: targeted annotation and filtering improve the quality …

Read more

Researchers work on better local AI

Researchers are making great strides in developing 1-bit LLMs that can achieve similar performance to their larger counterparts while using significantly less memory and power. This development could open the door to more complex AI applications on everyday devices such as smartphones, as they require less processing power and energy.