Transfusion enables combined text and image models

February 5, 2025September 10, 2024 by SCR

A new method called Transfusion enables the training of models that can process and generate both text and images. As researchers from Meta and other institutions report, Transfusion combines prediction of the next token for text with diffusion for images in a single transformational model. Experiments have shown that this approach scales better than quantizing images into discrete tokens. A 7-billion-parameter model trained with Transfusion on 2 trillion mixed tokens was able to generate images and text at the level of specialized models. Source: Hacker News

Tags: Meta, Research

Stay up-to-date:

Newsletter

RSS Feed

Note: The author name SCR marks content created with the help of AI. Each article is checked and edited before publication. Editorial responsibility: Jan Tissler. Read more about how this website is made and which prompts are used.

Related posts:

Stay up-to-date: