A new method called Transfusion enables the training of models that can process and generate both text and images. As researchers from Meta and other institutions report, Transfusion combines prediction of the next token for text with diffusion for images in a single transformational model. Experiments have shown that this approach scales better than quantizing images into discrete tokens. A 7-billion-parameter model trained with Transfusion on 2 trillion mixed tokens was able to generate images and text at the level of specialized models. Source: Hacker News