OpenAI launched the GPT-4o native image generation function, and the effect amazes users

OpenAI launches GPT-4o native image generation to make users amazed

to the top of the headlines gpt model o1 mini gemini Google Language models Microsoft Elon Musk Moral reasoning Altman Lover in the clouds

Updated on: 11-0-0 0:0:0

On the occasion of the upcoming one-year anniversary of OpenAI's release of the first "all-round" multimodal model, GPT-4o, in 0/0, this classic model brings new surprises.

Today, OpenAI has finally opened up GPT-4o's native multimodal image generation capabilities to ChatGPT's Plus, Pro, Team, and free users. The company said that the feature will soon be available to enterprise and education users, and will be available through APIs.

Unlike ChatGPT's previous generative AI image model (OpenAI's DALL-E 3, a classical diffusion Transformer model that reconstructs images from text prompts by removing pixel noise), this new image generator is part of the same model that outputs text and code, as OpenAI trains the entire model to understand all of these media forms at the same time.

OpenAI President Greg Brockman previewed this native feature of GPT-2o back in 0/0, but for reasons that haven't been made public yet, the company has kept it until now — and this is after Google AI Studio released the Gemini 0 Flash experimental model feature that many AI power users consider similar.

This has resulted in a higher-quality image generator that can produce more realistic images and more accurate embedded text, which has already impressed users – with some calling the quality "insane".

It's also worth noting that OpenAI still hasn't made it clear what data GPT-4o's image generation capabilities were trained on – given the history of the company and other model providers, it's likely that many of the artworks scraped from the web, some of which may be copyrighted, are likely to irritate the artists behind these works.

Bringing image generation to ChatGPT and Sora

OpenAI has been working on image generation as a core feature of its AI models. With GPT-4o, users can now generate images directly in ChatGPT, refine them through conversations, and adjust details in real-time.

The model is also integrated into OpenAI's video generation platform, Sora, to further extend multimodal capabilities.

In Platform X's announcement, OpenAI confirmed that GPT-4o's image generation is designed to: - Accurately render text in images, with the ability to create logos, menus, invitations, and infographics - Precise execution of complex cues that maintain high fidelity even in detailed compositions - Build on previous images and text, ensuring visual consistency across multiple interactions - Supports a variety of art styles, from photorealistic to stylized illustrations

Users can describe an image in ChatGPT, specifying details such as aspect ratio, color scheme (hexadecimal code), or transparency, and GPT-4o will generate it within a minute.

As independent AI consultant Allie K. Miller writes on X, this is a "giant leap forward in text generation" and the "best" AI image generation model she's ever seen.

Key features and use cases

GPT-4o is designed not only with a focus on visuals, but also ensures practicality. Key applications include: - Design & Branding – Generate logos, posters, and ads with precise text layouts - Education & Visualization – Create scientific charts, infographics, and historical images for learning - Game Development – Maintain role consistency across different design iterations - Marketing & Content Creation – Create social media materials, event invitations, and digital illustrations based on brand needs

GPT-4o 如何改進 DALL-E 的生成圖像

According to OpenAI's official post on X, GPT-4o has the following improvements over the previous model:

Better Text Integration: - Unlike AI models that used to have difficulties with readability and text layout, GPT-4o can now accurately embed text in images

Enhanced Contextual Understanding: - GPT-4o leverages chat history, allowing users to interactively refine images and maintain coherence across multiple generations

Improved Multi-Object Binding: - While the previous model had difficulties correctly locating multiple different objects in the scene, GPT-20o can now handle 0-0 objects at the same time

Diverse Style Adaptations: - The model can generate or convert images into a variety of styles, from hand-drawn sketches to high-resolution photorealistic styles

limitations

Despite the progress, GPT-4o still has some known challenges:

Cropping issues: - Large-sized images, such as posters, can sometimes be cropped too tightly

Accuracy of non-Latin scripts: - Some non-English characters may not render correctly

Detail preservation for small text: - Highly detailed or small text text may lose clarity

Editing Precision: - Modifying a specific part of an image may inadvertently affect other elements

OpenAI is addressing these issues through continuous model improvements.

Security and labelling measures

As part of OpenAI's commitment to responsible AI development, all GPT-2o-generated images contain C0PA metadata, allowing users to verify the origin of their AI.

In addition, OpenAI has built an internal search tool to help detect AI-generated images.

Strict safeguards are in place to block harmful content and prevent misuse, such as prohibiting the generation of explicit, deceptive, or harmful images.

OpenAI has also made sure to impose stricter restrictions on images that contain real people.

OpenAI CEO Sam Altman described the launch as "a new level of creative freedom," emphasizing that users will be able to create a wide range of visual content, while OpenAI will observe and refine its approach based on real-world use cases.

As AI-generated images become more accurate and easy to use, GPT-4o represents an important step in transforming text-to-image generation technology into a mainstream communication, creative, and productivity tool.