Google’s DataGemma specializes in statistics

Google is introducing two new AI models called DataGemma, which are designed to answer statistical questions more accurately. The models, based on the Gemma family, use data from Google’s Data Commons platform. As Shubham Sharma reports in an article for VentureBeat, the models use two different approaches: Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation … Read more

Transfusion enables combined text and image models

A new method called Transfusion enables the training of models that can process and generate both text and images. As researchers from Meta and other institutions report, Transfusion combines prediction of the next token for text with diffusion for images in a single transformational model. Experiments have shown that this approach scales better than quantizing … Read more

DeepMind V2A automatically generates audio for videos

Google’s AI research lab DeepMind has developed a new technology called V2A that can automatically generate appropriate soundtracks, sound effects, and even dialogue for videos. While V2A seems promising, DeepMind admits that the quality of the audio generated is not yet perfect. For now, it is not generally available.

Google DeepMind Gecko evaluates image generators

Google DeepMind develops “Gecko”, a new standard to evaluate the capabilities of AI image generators. It is designed to help better understand the strengths and weaknesses of AI models and drive their development.

Megalodon is a new architecture for AI models

Researchers at Meta and the University of Southern California have developed Megalodon, a new architecture for AI models. It allows language models to process significantly larger amounts of text without using a lot of memory.

Microsoft’s VASA-1 generates video from a photo and audio

Microsoft’s VASA-1 can make human portraits sing and talk. It only needs a still image and an audio file with speech to generate moving lips, matching facial expressions and head movements. Microsoft emphasizes that this is a research demonstration only, with no plans to bring it to market.

Google researches give AI “infinite” attention

With “Infini-attention”, Google researchers have developed a technology that allows language models to process texts of theoretically infinite length without requiring additional memory and computing power. Source: VentureBeat

Quiet-STaR helps language models to think

Researchers at Stanford University and Notbad AI want to teach language models to think before responding to prompts. Using their model, called “Quiet-STaR,” they were able to improve the reasoning skills of the language models they tested.

Google VLOGGER animates people from a single photo

Google researchers show VLOGGER, which can create lifelike videos of people speaking, gesturing and moving from a single photo. This opens up a range of potential applications, but also raises concerns about forgery and misinformation. Source: VentureBeat

EMO makes Mona Lisa sing

The research project EMO from China makes a photo (or a graphic or a painting like the Mona Lisa) talk and sing. The facial expressions are quite impressive, the lip movements not always. Unfortunately, there is no way to try EMO for yourself.