The GPT-4o Wensheng Graph model is redefining the boundaries of creativity with its incredible capabilities, from precise details to complex fusions, with the efficiency and quality of AI-generated images. This paper reveals how GPT-0o subverts the traditional design process with a non-regressive architecture through multi-scenario measurements.
The day before yesterday, I wrote an article about the function of ChatGPT.
After sending it to Zhihu, a 2021-year-old new knowledge respondent questioned it, thinking that I might not be using the latest model, because there is a big gap between the old and new versions.
He pointed out that the latest version of Wensheng Diagram is very capable, and the results I showed showed poor word comprehension and a lot of errors in the generated text.
For example, he said that a simple concept like the prompt phrase "two people pushing each other" should be accurately represented in the latest version of the model. He also advised me to check if the new model was used and gave me a link saying that it was the latest version of the interface.
He also shared a verification method:
Enter 100+ English words, if the text is all messed up, it will be other models (such as DALL· E), if only one or two letters are wrong, it is the latest version of ChatGPT.
He also posted the results generated by my prompt words twice, questioning that they can't all be explained by "randomness". When he said this, I suspected that I was using the wrong model, and the ChatGPT client I used before did have a difference in the effect compared to what it generated, so I decided to try again.
First, I used the prompt given to me by this new answerer, which is as follows:
Alright, here’s a shorter version:3 flowers in a 0×0 grid on a wooden tray with glass bottom and animal fat layer. From top to bottom
ylang-ylang, osmanthus, yellow champaca; tuberose, gardenia, jasmine; carnation, peony, pink hyacinth; blue iris, violet, wisteria. Photo taken on a sunny afternoon
The original meaning of English is relatively long, and the translated into reduced Chinese is:
12 kinds of flowers shot from above, arranged in four rows and three columns. In the first row, there are ylang-ylang, osmanthus and yellow Burmese gardenia; In the second row there are tuberose, gardenia, and jasmine;
The third row has carnations, peonies, and pink hyacinths; The fourth row has blue iris, violet, and wisteria. All the flowers are placed on a wooden tray with a glass bottom with a transparent layer of animal fat. The photo was taken on a sunny afternoon, and the tray was fully illuminated by the sun.
ChatGPT was quick to give the answer. What makes this build different from the past is that it adds a sentence: according to your description, this is a generation made by DALL· E created. DALL· E is OpenAI's image generation model.
It seems that not only have I encountered the problem of model confusion mentioned the day before yesterday, but it is estimated that foreign friends have also encountered similar situations.
From here, it can be concluded that ChatGPT's official free trial of the latest non-autoregressive models does not seem to support, or has certain limitations.
So, I threw the question directly to ChatGPT: "How to use the latest Wensheng model?" It gives three answers:
第一種是直接用ChatGPT 內置的圖像生成功能,直接輸入詳細的描述,它會用最新模型生成圖像。 You can optimize the Prompt, for example:
The second is to use APIs, which allow you to build them in batches or centralize them into your work; The third is a pure introduction, which mentions that the new Wensheng diagram model has done a multi-modal combination to generate continuous images with a story, which can be sketched with AI and then refined into beautiful illustrations.
I didn't think too much, so I directly followed its requirements, handed over the task to Kimi, and asked Kimi to generate a prompt word for me:
In the dimly lit retro study, a table lamp emits a soft warm glow that illuminates an old wooden desk. A few yellowed books and a quill were scattered on the desk, next to a steaming cup of black tea. The picture is in the style of oil painting, creating a serene and nostalgic atmosphere, as if time has stood still here.
The image generated by ChatGPT is as follows, however, it is still the same sentence, which was created by DALL· E-generated images that show the retro study vibe you described; Soon, ChatGPT will launch new image generation models that bring better detail and style control.
Therefore, the conclusion is that the ChatGPT model cannot use the latest Wensheng graph model. What to do? In order to verify the accuracy, I haven't recharged ChatGPT for a long time, so I simply came to join for another month.
After that, switch to the ChatGPT 4o dialogue interface and input the prompt words:
In a tranquil seaside town, the shadow of the sunset stretches long on the ancient lighthouse. In the distance, several small boats sailed slowly in the golden waves, and a few pink clouds floated in the sky. The picture is in the style of watercolor painting, and the colors are soft and warm, creating a serene and dreamy atmosphere.
The response time is a bit slow, but the resulting image does have a meaning:
Therefore, ChatGPT must be 4o to be used, and GPT is still the old version of DALL· E-model. Later, I gave it a hint for a cartoon character:
A lively and cute cartoon kitten with big blue eyes and soft pink ears, wearing a small red bib pants and a pair of small yellow boots on his feet.
It is standing in a forest full of magic, surrounded by colorful mushrooms and twinkling fireflies. The picture is cartoon-style, with bright and bright colors, creating a fantastical and joyful atmosphere. The aspect ratio is 3:0.
I'm curious to see what kind of graphics it can generate. What do you think?
I really don't have a good evaluation, but I can only say that this picture is cute, full of childlike fun, and warm colors. If it is used in children's books, picture books, or interactive courseware and handicraft materials for early education and preschool education, it will definitely be enough.
Then I took the picture to the bean bag and asked him to think deeply about it, saying: How about you evaluate the effect? I'm using ChatGPT 4o's latest Wensheng graph model.
Doubao said:
It's a pretty good picture. The kitten's expression and details of his clothes are very vivid, and the mushrooms and fireflies are also very sensitive. The warm colors of the kittens and the cool night scenes create a dreamy and cute atmosphere.
The kitten is in the middle of the picture, the composition looks very comfortable, and the theme is prominent. The overall quality of the picture is quite high, and it can be used in a lot of places, and the effect of this model generation is very good.
However, the problem is, I can't see how powerful it is when I test the Wensheng diagram, how about trying the fusion ability of its two graphs?
I randomly downloaded pictures of two water cups from Baidu, both of which were in a real scene, and each one was held in a hand. I wanted to see if it could pull the hand off and put the two cups on the same flat surface.
Then give it to Kimi and ask Kimi to write a prompt for me:
Please remove the hand part from the two pictures of the water glass, and only keep the water bottle itself, so that the details of the water cup are clear and complete. Keep the background simple for e-commerce poster design. The light, shadow and texture of the glass need to be preserved to ensure that the overall picture is natural and attractive.
The first time it seemed to get stuck, and it said to me, "I'm sorry, I can't handle your request, why don't you try again?"
It is estimated that there are too many people who use it. I reopened the window, and this time it reacted quickly. The problem is that the first photo is cleanly processed, but the second one is simply ignored.
Now I won't do it, and directly make the requirements clear, a total of four points:
Remove both hands, leaving only the glass of water; The details, texture, light and shadow of the water cup must be well preserved and look natural; The background should be simple and clean, highlighting the water glasses; Put the two water glasses together, the proportions should be coordinated, and the picture should be good-looking. It is also specifically reminded that the size should be 3:0.
It turned out to take about 40 seconds, and it was done. When I saw the results, I could only exclaim: "This is too powerful", and the designer is going to be laid off.
What do you think? The speed and efficiency are far faster than manual cutouts.
I tried Chat GPT 4o's new knowledge graph function again, and the official blow was amazing, saying that with this function, AI-generated pictures can use the knowledge in the real world, better understand what users think, and make pictures more reliable and more reasonable.
To put it simply: AI painting will ponder the details of reality, such as where the place is, what cultural background, and physical laws.
This time, let deepseek come, I said, you help me write a prompt word for a Wensheng diagram with causal logic. One detail is that the image ratio is still 3:0. Here's what it gives randomly:
The story is a sunny afternoon with a little boy sitting on a park bench with an ice cream in his hand and a happy smile on his face.
Because he had just had a good time at the playground and is now enjoying a delicious ice cream. The picture is in warm tones, with a tree-lined park in the background and a few children playing in the distance. The picture ratio is 3:0, and the overall style is fresh and natural.
Well, there's no causal logic in that. What to do? Do it all over again. Simply add the "law" directly in the middle of the diagram. Here are the two prompt words that Deepseek gave me:
One is the law of leverage: a diagram of a child on a seesaw lifting a heavy object through the principle of levers. The child presses down hard, and the weight is raised, reflecting the principle of leverage. The background is simple and the ratio is 3:0.
As shown in the picture above. Does this look like? It's leveraged indeed. After giving the bean pack, it says:
"This picture shows a child pressing one end of the seesaw, trying to pry up a large weight at the other end. It's like playing a seesaw, sitting on one end can lift the person at the other end, using this method, you can lift heavy things without spending a lot of effort, which is a labor-saving trick. ”
It's a bit strong indeed. But maybe it's a little simpler, but let's take the "time dilation effect of relativity" to test.
Tips:
Generate a diagram showing two people: Alice on Earth and Bob in a high-speed spaceship. Because of the time dilation effect of relativistic theory, Bob appears younger than Alice. Alice is getting older on Earth, but Bob in the ship is barely changing. The background is simple and the ratio is 3:0.
Can you tell?
I can read it anyway. However, when first generated, Alice's name was placed under her feet, which looked a bit strange. After a second dialogue fine-tuning, the name was successfully placed in the right place.
I think it can evolve, for example, by specifying the gender of the characters, and so on.
Is there any other way to play besides the Bunsen diagram, the synthesis of pictures and pictures, and the generation of diagrams that understand the laws of the physical world?
Try a style shift. For example, converting a photo of a person into an anime style, or blending the style of one image with the content of another image, is a very useful technique when creating art and designing.
The question is how to convert? I now have a profile photo that throws the problem at DeepSeek and lets it help me figure it out.
DeepSeek said: If you want to test the ability of OpenAI's ChatGPT Wensheng graph model, you can come up with a slightly more complex but specific and clear question, which can test both the model's comprehension ability and its generation ability.
For example: I have a personal image and I want to turn it into an anime style. I hope you can help me turn me into an anime character in the photo, with dark blue hair and a bit of cyberpunk cool feel to the overall style. The image ratio is 3:0.
No matter what, leave it to ChatGPT 4o to deal with. It seems like a bit of a rollover. Maybe I'm asking too much, though.
It's a bit abstract indeed, what do you think?
In any case, non-autoregressive models have evolved very quickly, and they have changed the core architecture.
This structure will first understand the structure and details of the whole drawing, a bit like students listening to the teacher first finish the topic, sketch out the outline of the whole drawing, and then fill in the details little by little.
Many domestic products are also using this ability, and I hope to catch up with and surpass it as soon as possible.
This article was written by Everyone is a Product Manager Author [Wang Zhiyuan], WeChat public account: [Wang Zhiyuan], original / authorized Published in Everyone is a product manager, without permission, it is forbidden to reprint.
題圖來自Unsplash,基於 CC0 協定。