New open source challenger takes on Google in the AI image race

The open source community has a powerful new tool for high precision image generation. GLM Image, a model from the startup Z.ai, demonstrates superior accuracy in rendering complex text compared to Google’s Nano Banana Pro. Carl Franzen reports for VentureBeat that this new model excels at creating information dense visuals such as infographics and technical diagrams.

The model uses a unique 16 billion parameter hybrid design. It splits the work between two specialized components. An auto regressive module acts as an architect to plan the layout and text placement. Then, a diffusion decoder acts as a painter to fill in textures, lighting, and style. This separation allows the system to treat composition as a logical reasoning problem first. In the CVTG 2k benchmark, which measures text accuracy across multiple regions, GLM Image achieved a score of 0.91. Google’s proprietary model reached 0.77 in the same test.

Enterprise users may find the licensing particularly attractive. The model uses permissive MIT and Apache 2.0 licenses. These allow companies to host the software on their own servers and modify it for specific products without restrictive contracts. This flexibility supports data privacy and deep customization for corporate needs.

However, the model has clear drawbacks. Real world tests show that Google still holds an advantage in general instruction following and aesthetic quality. GLM Image also requires significant computing power. Generating a single high resolution image can take several minutes even on expensive hardware. Despite these hurdles, the release signals a shift where open source tools can match or exceed proprietary giants in specialized technical tasks.

About the author

Related posts:

Stay up-to-date:

Advertisement