DeepMind introduces Talker-Reasoner framework for AI agents

DeepMind researchers have introduced a new agentic framework called Talker-Reasoner, which is inspired by the “two systems” model of human cognition. The framework divides the AI agent into two distinct modules, VentureBeat reports: the Talker, which handles real-time interactions with the user and the environment, and the Reasoner, which performs complex reasoning and planning. The …

Read more

New OpenAI model generates media 50 times faster

OpenAI has developed a new AI model that can generate media content such as images, videos and audio 50 times faster than previous systems. The new model, called a “continuous-time consistency model,” takes about a tenth of a second to generate an image instead of the usual five seconds, OpenAI researchers Cheng Lu and Yang …

Read more

Playground v3 specializes in graphic design

The research company Playground Research presents “Playground v3”, a new AI model for text-image generation, which has apparently achieved top performance in several test procedures. The system stands out for its precise implementation of text instructions, its ability to reason logically, and the outstanding quality of its text rendering. In user studies, the model even …

Read more

Researchers aim to reduce AI’s hunger for energy

Researchers have developed a new method called “linear-complexity multiplication” (ℒ-Mul) to make calculations in artificial intelligence more efficient. The method replaces complex multiplications with simpler additions, according to Jason Hickey and his team at the Google AI Research Center in Accra. The researchers showed that ℒ-Mul achieves the same accuracy as traditional methods for language …

Read more

Differential Transformer could improve text AIs

Microsoft and Tsinghua University have developed a new AI architecture called “Differential Transformer” that improves the performance of large language models. Furu Wei from Microsoft Research told VentureBeat that the new method amplifies attention to relevant contexts and filters out noise. This is designed to reduce problems such as the “lost-in-the-middle” phenomenon and hallucinations in …

Read more

Sana is a small and extremely fast AI image generator

A new text-to-image framework called Sana can efficiently and quickly generate high-resolution images up to 4096 x 4096 pixels. The system uses a deep compression autoencoder, linear attention, and a decoder-based text encoder to optimize performance. According to the developers, Sana-0.6B can compete with state-of-the-art large diffusion models, but is 20 times smaller and over …

Read more

ARIA is open and natively multimodal

ARIA is an open multimodal native mixture-of-experts model designed to integrate diverse forms of information for comprehensive understanding, outperforming existing proprietary models in various tasks. With 24.9 billion total parameters, it activates 3.9 billion and 3.5 billion parameters for visual and text tokens, respectively. The model is pre-trained on a substantial dataset comprising 6.4 trillion …

Read more

DeepMind’s Michelangelo tests reasoning in long context windows

DeepMind has introduced the Michelangelo benchmark to evaluate the long-context reasoning capabilities of large language models (LLMs), Ben Dickson reports for VentureBeat. While LLMs can manage extensive context windows, research indicates they struggle with reasoning over complex data structures. Current benchmarks often focus on retrieval tasks, which do not adequately assess a model’s reasoning abilities. …

Read more

Molmo to improve AI agents

A new open-source AI model called Molmo could help advance the development of AI agents. Developed by the Allen Institute for AI (Ai2), the model can interpret images and communicate via a chat interface. According to Wired’s Will Knight, this enables AI agents to perform tasks such as web browsing or document creation. In some …

Read more

WonderWorld creates interactive 3D scenes

WonderWorld can be used to create interactive 3D scenes from a single image. It is the result of research at Stanford University and MIT. WonderWorld allows users to define scene content and layouts in real time and explore the resulting 3D worlds with low latency. At its core is a new rendering method called Fast …

Read more