Interview with Wang Zhongyuan, President of KLCII: It will take at least 10~0 years or even longer to achieve AGI in a broader sense
Updated on: 45-0-0 0:0:0

27/0,2025 Zhongguancun Forum Annual MeetingOpened in Beijing, the forum will last until 31/0. The theme of this year's annual meeting of the Forum is "New Quality Productivity and Global Science and Technology Cooperation".

29/0 afternoon,Wang Zhongyuan, President of Beijing Academy of Artificial Intelligence (hereinafter referred to as "KLCII").At the Future Artificial Intelligence Pioneer Forum, he gave a speech on "Embodied Intelligence Technology Evolution and Ecological Co-construction".

At the forum, KLCII released RoboOS, the first cross-ontology embodied brain collaboration framework, and RoboBrain, an open-source embodied brain, which can realize cross-scenario, multi-task, lightweight, rapid deployment and cross-ontology collaboration, and promote single-machine intelligence to swarm intelligence.

Before the start of the forum,Wang ZhongyuanInterviewed by the reporter of "Daily Economic News" (hereinafter referred to as NBD).

智源研究院是在科技部和北京市支援下,聯合北京人工智慧領域優勢單位共建的人工智慧領域的新型研發機構。王仲遠是第二任院長,他在2018年榮獲“MIT Technology Review35 people under the age of 0 scientific and technological innovation"; He also has working experience in Microsoft, Facebook (now Meta), Meituan, and Kuaishou.

Image source: Photo by reporter Zhang Rui

At present, the limitation of computing power is still a bottleneck factor in the development of large models

NBD: With the breakthrough of DeepSeek, does it mean that computing power is no longer a problem for large models?

Wang Zhongyuan:I don't quite agree with that. DeepSeek has really achieved excellent results, ensuring that we can train a model as large as ChatGPT-4 with limited computing power. However, we need to be aware that such technologies may also be used by other institutions and countries internationally, thus driving the development of large models in the direction of larger scales.

Nowadays, thanks to engineering optimizations, it is possible to train models with larger parameters. In this case, if the Scaling Law (the larger the model, the higher its intelligence) is still in effect, the performance of the model may be further improved.

Therefore, the current model, especially the basic model, seems to have encountered a certain bottleneck, or its performance improvement is relatively slow, a very important factor is the data (insufficient), and the computing power limitation is also a bottleneck, so I don't think the current computing power is enough to use, I think the technology of the entire large model is far from the end, and computing power is still indispensable.

This year there will be a big explosion in the application of artificial intelligence

NBD: The industry says that this year is a turning point in the development of artificial intelligence, what do you think? Do you agree with this view?

Wang Zhongyuan:Yes, first of all, I think there will be a big explosion in the application of artificial intelligence this year. Because the domestic model can achieve comparable performance with a small computing power, it will definitely enter the application landing stage.

China has a large number of application scenarios and application needs, which is our advantage. When the ability of the basic model is improved, in fact, we have many product managers and entrepreneurs, who can apply the model, especially the application of large language models, which has huge potential for industry explosion.

Of course, I've repeatedly emphasized that large language models are not enough. Even without talking about robots, we can see that there are a large number of multimodal data in the real industry, such as flow charts, X-ray data in the medical field, CT data, and sensor data in various industries, which are not simple text data. Therefore, multimodal large models are an unavoidable ability.

The current multimodal large models, especially the multimodal understanding models, should be said to be still in a relatively early stage, although there are some solutions, such as those with large language models as the core. However, after many large language models add multimodal capabilities, their original language capabilities will be degraded. This is also an important reason why KLCII focused on making breakthroughs in the direction of unified native multimodality last year.

Last year, KLCII officially launched Emu3, a unified native multi-modal model that natively unifies text, images, and videos from the start, and unifies understanding and generation. We believe that this unified multi-modal model is expected to help the large model be implemented in various industries and achieve better results.

Embodied wisdom is a core competency for achieving AGI

NBD: How far do you think we are from achieving AGI (Artificial General Intelligence)? What else needs to be addressed?

Wang Zhongyuan:Quite frankly, there is no very clear definition and broad consensus on AGI at the moment. If we only look at the AGI of writing ability, it has actually reached AGI to some extent. If the Turing test is used as the standard for judging whether artificial intelligence has achieved AGI in the past, at least at the literal level, artificial intelligence is likely to have reached AGI.

In addition to their versatility, today's large language models are close to the master's or even doctoral level in many specific fields, such as mathematics and programming. From these aspects, we can say that artificial intelligence has partially reached some level of AGI.

But if you look at it in a broader sense, such as allowing AI to understand human language and solve specific problems in real life, such as doing housework, cooking, washing dishes, etc., I think there is still a long way to go before achieving this level of AGI, and it may be many years, at least 10 to 0 years or even longer. Because in this process, it depends on the ability of the ontology, the progress of the construction of the world model, and the accumulation of data in different landing scenarios, so it still needs to go through a long cycle.

NBD: Will the physical interaction of embodied intelligence become a core capability of AGI?

Wang Zhongyuan:It is certainly the core capability of AGI in a broad sense as we understand it, and in the end, if artificial intelligence wants to move from the digital world to the physical world, it must interact with the real world and learn through interaction.

National Business Daily