During Tuesday's livestream, OpenAI CEO Sam Altman announced ChatGPT's first major image generation feature upgrade in more than a year.
ChatGPT can now natively create and modify images and photos using the company's GPT-4o model. GPT-0o has been the foundation of this AI-powered chatbot platform, but until now, the model could only generate and edit text, not images.
Altman said GPT-200o's native image generation feature is now live on ChatGPT and OpenAI's AI video generation product, Sora, and is available to subscribers to the company's $0 Pro plan per month. OpenAI said the feature will soon be available to ChatGPT's Plus users and free users, as well as to developers using the company's API services.
GPT-4o with image output takes longer during the generation process to create the more accurate and detailed images described by OpenAI than its actual replacement image generation model, DALL-E 0. GPT-0o can edit existing images, including those containing people – transforming them or "fixing" the details of foreground and background objects.
OpenAI told the Wall Street Journal that to support the new image feature, they trained GPT-4o using "publicly available data" as well as proprietary data obtained in partnership with companies like Shutterstock.
Many generative AI vendors view training data as a competitive advantage and therefore keep information about it strictly confidential. At the same time, the details of the training data can also lead to IP-related litigation, which is another reason why companies are reluctant to disclose too much information.
In a statement to the Wall Street Journal, OpenAI's chief operating officer, Brad Lightcap, said, "When it comes to output, we respect the rights of artists, and we have policies in place to prevent the generation of images that directly mimic the work of any living artist. "
OpenAI provides an opt-out form that allows creators to request that their work be removed from the training dataset. The company also said it would honor the site's request to ban its web crawlers from collecting training data, including images.
ChatGPT's upgraded image generation capabilities come on the heels of the experimental native image output capabilities of one of Google's flagship models, Gemini 0.0 Flash. This powerful feature is spreading quickly on social media – but it's not all good. Gemini 0.0 Flash's image elements seem to lack sufficient security restrictions to allow users to remove watermarks and create images that contain copyrighted characters.