Research | Page 9 of 11 | ✦ Smart Content Report

Sana is a small and extremely fast AI image generator

February 5, 2025October 18, 2024

A new text-to-image framework called Sana can efficiently and quickly generate high-resolution images up to 4096 x 4096 pixels. The system uses a deep compression autoencoder, linear attention, and a decoder-based text encoder to optimize performance. According to the developers, Sana-0.6B can compete with state-of-the-art large diffusion models, but is 20 times smaller and over …

ARIA is open and natively multimodal

February 5, 2025October 16, 2024

ARIA is an open multimodal native mixture-of-experts model designed to integrate diverse forms of information for comprehensive understanding, outperforming existing proprietary models in various tasks. With 24.9 billion total parameters, it activates 3.9 billion and 3.5 billion parameters for visual and text tokens, respectively. The model is pre-trained on a substantial dataset comprising 6.4 trillion …

DeepMind’s Michelangelo tests reasoning in long context windows

February 5, 2025October 11, 2024

DeepMind has introduced the Michelangelo benchmark to evaluate the long-context reasoning capabilities of large language models (LLMs), Ben Dickson reports for VentureBeat. While LLMs can manage extensive context windows, research indicates they struggle with reasoning over complex data structures. Current benchmarks often focus on retrieval tasks, which do not adequately assess a model’s reasoning abilities. …

Molmo to improve AI agents

February 5, 2025September 25, 2024

A new open-source AI model called Molmo could help advance the development of AI agents. Developed by the Allen Institute for AI (Ai2), the model can interpret images and communicate via a chat interface. According to Wired’s Will Knight, this enables AI agents to perform tasks such as web browsing or document creation. In some …

WonderWorld creates interactive 3D scenes

February 5, 2025September 23, 2024

WonderWorld can be used to create interactive 3D scenes from a single image. It is the result of research at Stanford University and MIT. WonderWorld allows users to define scene content and layouts in real time and explore the resulting 3D worlds with low latency. At its core is a new rendering method called Fast …

EzAudio creates high quality sound effects

February 5, 2025September 20, 2024

Researchers at Johns Hopkins University and Tencent AI Lab have developed a new text-to-audio model called EzAudio. As Michael Nuñez reports for VentureBeat, EzAudio can generate high-quality sound effects from text descriptions. The model uses an innovative method for processing audio data and a new architecture called EzAudio-DiT. In tests, EzAudio outperformed existing open-source models …

Google’s DataGemma specializes in statistics

February 5, 2025September 15, 2024

Google is introducing two new AI models called DataGemma, which are designed to answer statistical questions more accurately. The models, based on the Gemma family, use data from Google’s Data Commons platform. As Shubham Sharma reports in an article for VentureBeat, the models use two different approaches: Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation …

Transfusion enables combined text and image models

February 5, 2025September 10, 2024

A new method called Transfusion enables the training of models that can process and generate both text and images. As researchers from Meta and other institutions report, Transfusion combines prediction of the next token for text with diffusion for images in a single transformational model. Experiments have shown that this approach scales better than quantizing …

Benchmarks for AI agents flawed study reveals

February 5, 2025July 12, 2024

A new research report from Princeton University reveals weaknesses in current benchmarks and evaluation practices for AI agents. The researchers argue that cost control is often neglected in evaluation, even though the resource costs of AI agents can be significantly higher than those of individual model queries. This leads to biased results, as expensive agents …

DeepMind JEST speeds up AI training

February 5, 2025July 12, 2024

Google’s DeepMind researchers have developed a new method called JEST that significantly speeds up AI training while reducing energy requirements. By optimizing the selection of training data, JEST can reduce the number of iterations by a factor of 13 and the computational complexity by a factor of 10.