Google DeepMind Gecko evaluates image generators

Google DeepMind develops “Gecko”, a new standard to evaluate the capabilities of AI image generators. It is designed to help better understand the strengths and weaknesses of AI models and drive their development.

Megalodon is a new architecture for AI models

Researchers at Meta and the University of Southern California have developed Megalodon, a new architecture for AI models. It allows language models to process significantly larger amounts of text without using a lot of memory.

VideoGigaGAN improves video upscaling

VideoGigaGAN outperforms previous methods of video upscaling, creating videos with a high level of detail and consistency. The approach is based on the GigaGAN image upscaler and solves its video processing problems through special techniques that result in sharper and smoother videos. Source: Hacker News

Microsoft’s VASA-1 generates video from a photo and audio

Microsoft’s VASA-1 can make human portraits sing and talk. It only needs a still image and an audio file with speech to generate moving lips, matching facial expressions and head movements. Microsoft emphasizes that this is a research demonstration only, with no plans to bring it to market.

Google researches give AI “infinite” attention

With “Infini-attention”, Google researchers have developed a technology that allows language models to process texts of theoretically infinite length without requiring additional memory and computing power. Source: VentureBeat

Symbolica wants to make AI more transparent and controllable

AI startup Symbolica is focusing on a new approach to giving AI models human-like reasoning and unprecedented transparency. According to the company, it aims to overcome the “alchemy” of today’s AI systems and create a scientific foundation that will lead to interpretable, data-efficient, and controllable AI models. Source: VentureBeat

Quiet-STaR helps language models to think

Researchers at Stanford University and Notbad AI want to teach language models to think before responding to prompts. Using their model, called “Quiet-STaR,” they were able to improve the reasoning skills of the language models they tested.

Google VLOGGER animates people from a single photo

Google researchers show VLOGGER, which can create lifelike videos of people speaking, gesturing and moving from a single photo. This opens up a range of potential applications, but also raises concerns about forgery and misinformation. Source: VentureBeat

EMO makes Mona Lisa sing

The research project EMO from China makes a photo (or a graphic or a painting like the Mona Lisa) talk and sing. The facial expressions are quite impressive, the lip movements not always. Unfortunately, there is no way to try EMO for yourself.