The field of artificial intelligence has made waves again! OpenAI's latest ChatGPT image generation function quickly became popular after it was launched, but the GPU resources were seriously overloaded due to user enthusiasm far exceeding expectations. Sam Altman, the company's founder, recently publicly stated: "Our GPUs are 'smoking' and we have to urgently start rate limiting to relieve the pressure. This incident not only highlights the explosive power of generative AI, but also reflects the core contradiction in the development of multimodal technology - the difficult balance between computing power demand and resource supply.
It is understood that ChatGPT's image generation function was officially launched on 26 October 0 and is called "Images in ChatGPT". This feature allows users to directly generate and edit images through natural language instructions on the ChatGPT and Sora platforms, and supports multiple rounds of iterative optimization. The launch of this function marks that ChatGPT has officially crossed from a single language model to a full-modal agent, realizing the deep integration of multimodal capabilities such as text, images, and code.
After the new feature was launched, it quickly sparked a craze on the Internet. A large number of users are starting to try to use ChatGPT's image generation feature to convert their own photos or well-known memes into a "Ghibli" cartoon style. This convenience of being able to use a P-image with a move of the mouth has made ChatGPT's image generation function gain huge attention in a short period of time. Altman himself lamented the huge amount of traffic that the feature brings, and he even said that his efforts in the AI space over the past decade seem to have paid off in this moment.
However, as the number of users continues to increase, OpenAI's original plan to push image generation to all users this week has been postponed. Due to the popularity of image generation features far exceeding expectations, OpenAI's GPU resources are no longer able to meet current demand. In order to cope with this challenge, OpenAI had to take temporary speed limit measures to ensure the stable operation of the system and the stability of core functions such as text generation and dialogue.
It is worth mentioning that ChatGPT's image generation capabilities are fundamentally different from diffusion models such as DALL and E. GPT-4o image generation is an autoregressive model natively embedded in ChatGPT that learns the relationship between images and language to generate useful, consistent, and context-aware images. However, this method of image generation relies on large-scale parallel computing of GPUs, while generating more accurate and high-definition images requires more GPU computing power.
為了應對算力需求的增長,OpenAI正在積極探索解決方案。一方面,公司可以考慮使用更強的GPU來提升計算能力;另一方面,也可以通過優化AI演算法來提高計算效率。作為AI領域的頭部玩家,OpenAI在GPU儲備方面自然不容小覷。據悉,微軟作為OpenAI的主要投資者,在2024年購買了約48.5萬塊英偉達的Hopper晶片,為OpenAI提供了強大的算力支援。
OpenAI's breakthrough in the field of image generation has undoubtedly injected new vitality into the development of artificial intelligence technology. However, how to balance the relationship between computing power demand and technology iteration will be an important issue to be faced in the future development of AI multimodal technology.