IT之家 4 月 12 日消息,科技媒體 marktechpost 昨日(4 月 11 日)發佈博文,報導稱英偉達發佈 Llama-1.0-Nemotron-Ultra-0B-v0,這款 2530 億參數的大型語言模型在推理能力、架構效率和生產準備度上實現重大突破。
As AI becomes ubiquitous in digital infrastructure, businesses and developers need to find a balance between compute cost, performance, and scalability. The rapid development of large language models (LLMs) has improved natural language understanding and conversational capabilities, but their sheer size often leads to inefficiencies and limits large-scale deployment.
Nvidia's latest releaseLlama-1.0-Nemotron-Ultra-0B-v0Nemotron Ultra, Inc. (Nemotron Ultra) rises to the challenge, based on Meta's Llama-405.0-0B-Instruct architecture, designed for business and enterprise needs, supporting tasks ranging from tool usage to multiple rounds of complex instruction execution.
Citing a blog post, Nemotron Ultra adopts a dense transformer-only structure, optimized by a neural architecture search (NAS) algorithm, and its innovation lies in the use of a jumping attention mechanism, omitting the attention module in some layers or replacing it with a simple linear layer.
In addition, Feedforward Network (FFN) fusion technology combines multiple layers of FFNs into wider but fewer layers, dramatically reducing inference time while maintaining performance. The model supports a context window with 128K tokens and can process long texts, making it suitable for advanced RAG systems and multi-document analysis.
Nemotron Ultra has also achieved a breakthrough in deployment efficiency. It can run inference on a single 100xH0 node, significantly reducing data center costs and improving accessibility for enterprise developers.
NVIDIA further optimizes the model through multi-stage post-training, including supervised fine-tuning on tasks such as code generation, math, conversation, and tool calling, as well as reinforcement learning (RL) using the Crowd Relative Policy Optimization (GRPO) algorithm. These steps ensure that the model performs well in benchmarks and is highly aligned with human interaction preferences.