OpenAI brings image generation to a new level

OpenAI has launched native image generation capabilities directly within ChatGPT, powered by its multimodal model GPT-4o. This new feature, called “Images in ChatGPT,” is now available to users across Plus, Pro, Team, and Free subscription tiers, with Enterprise, Edu, and API access coming soon.

Unlike the previous DALL-E 3 image generator, which was a separate diffusion model, this new capability is built directly into GPT-4o itself, what OpenAI calls a “natively multimodal model.” According to OpenAI’s research lead Gabriel Goh, this integration represents “a step change above previous models” in quality and capabilities.

Key improvements and capabilities

The new image generation system excels in several areas that have traditionally challenged AI image generators:

Enhanced text rendering: The system can produce coherent, readable text within images without the garbled letters common in other generators. OpenAI states this was achieved through “many months of small improvements.”
Better “binding”: The system can maintain correct relationships between attributes and objects for 15-20 items without confusion. That is far beyond the 5-8 object limit where most generators start mixing up colors, shapes, and attributes.
Contextual awareness: Images can be refined through natural conversation, with the model maintaining visual consistency across multiple iterations.
World knowledge integration: As ChatGPT multimodal product lead Jackie Shannon explained, “The model brings world knowledge to the equation,” allowing users to create accurate visualizations of concepts without needing to explain them in detail.

The technology uses an autoregressive approach, generating images sequentially from left to right and top to bottom (similar to how text is written), rather than the diffusion model technique used by DALL-E and other image generators that create the entire image at once.

Practical applications

The new capabilities make AI-generated images more practical for everyday use beyond artistic creation. Examples highlighted by OpenAI include:

Creating scientific diagrams with correctly labeled components
Designing multi-panel comics with consistent characters
Producing informational posters and menus with accurate text
Generating transparent background images for stickers and logos

These improvements could transform image generation from a primarily decorative tool into one for precise visual communication.

Safety measures and limitations

OpenAI emphasized that the system includes safeguards against potential misuse. All generated images include C2PA metadata marking them as AI-created, though they lack visible watermarks.

Despite these advances, OpenAI acknowledges limitations including cropping issues with large images, difficulties with non-Latin scripts, problems with small text detail, and challenges in precise editing.

The new image generation also takes longer than previous systems, which OpenAI suggests is a worthwhile tradeoff for improved quality.

Industry experts and users have already expressed strong positive reactions to the quality improvements, with independent AI consultant Allie K. Miller describing it as a “huge leap in text generation” and “the best” AI image generation model she’s seen.

Sources: OpenAI, The Verge, VentureBeat

Key improvements and capabilities

Practical applications

Safety measures and limitations

Stay up to date

Related posts: