Microsoft MInference increases the speed of LLMs

Microsoft’s new “MInference” technology promises to significantly increase the processing speed of large language models by reducing the preprocessing time of long texts by up to 90%. An interactive demo on Hugging Face allows developers to test the technology and explore its capabilities.

DeepMind V2A automatically generates audio for videos

Google’s AI research lab DeepMind has developed a new technology called V2A that can automatically generate appropriate soundtracks, sound effects, and even dialogue for videos. While V2A seems promising, DeepMind admits that the quality of the audio generated is not yet perfect. For now, it is not generally available.

Researchers claim dramatic improvements to energy efficiency

Researchers have found a way to dramatically improve the energy efficiency of large language models without sacrificing performance. Using their system, a language model with billions of parameters can be run on as little as 13 watts. The researchers have also developed proprietary hardware that further maximizes energy savings.

New sources of better AI training data

Large Language Models (LLMs) are no longer trained solely on data from the Internet. In the past, LLMs were based on the vast data pool of the Internet, but this approach has reached its limits. To advance LLMs, companies like OpenAI are turning to new types of data: targeted annotation and filtering improve the quality … Read more

Researchers work on better local AI

Researchers are making great strides in developing 1-bit LLMs that can achieve similar performance to their larger counterparts while using significantly less memory and power. This development could open the door to more complex AI applications on everyday devices such as smartphones, as they require less processing power and energy.

Google DeepMind Gecko evaluates image generators

Google DeepMind develops “Gecko”, a new standard to evaluate the capabilities of AI image generators. It is designed to help better understand the strengths and weaknesses of AI models and drive their development.

Megalodon is a new architecture for AI models

Researchers at Meta and the University of Southern California have developed Megalodon, a new architecture for AI models. It allows language models to process significantly larger amounts of text without using a lot of memory.

VideoGigaGAN improves video upscaling

VideoGigaGAN outperforms previous methods of video upscaling, creating videos with a high level of detail and consistency. The approach is based on the GigaGAN image upscaler and solves its video processing problems through special techniques that result in sharper and smoother videos. Source: Hacker News

Microsoft’s VASA-1 generates video from a photo and audio

Microsoft’s VASA-1 can make human portraits sing and talk. It only needs a still image and an audio file with speech to generate moving lips, matching facial expressions and head movements. Microsoft emphasizes that this is a research demonstration only, with no plans to bring it to market.

Google researches give AI “infinite” attention

With “Infini-attention”, Google researchers have developed a technology that allows language models to process texts of theoretically infinite length without requiring additional memory and computing power. Source: VentureBeat