Researchers at New York University have created a new AI architecture that generates higher-quality images more efficiently. Ben Dickson reports for VentureBeat that the model, called Representation Autoencoders or RAE, improves an AI’s understanding of image content, which leads to better results.
The new method challenges common practices in building diffusion models, the technology behind most popular image generators. Current models often use an autoencoder component that excels at visual details but lacks a deeper understanding of the image’s subject. The NYU team replaced this with RAE, which incorporates powerful, pre-existing models that are experts at visual understanding.
According to paper co-author Saining Xie, this approach helps “connect that understanding part with the generation part.” By co-designing the model’s components, the researchers found their system learns much faster. The RAE-based model achieves a 47-fold training speedup compared to previous diffusion models and requires significantly less computing power.
This increased efficiency and semantic understanding results in images of a higher quality, achieving state-of-the-art scores on the ImageNet benchmark. Xie believes the technology could unlock more reliable features for business applications and is a step toward unified AI models that can process and generate various types of media, from images to video.