The Nemotron 3 lineup includes Nano, Super, and Ultra models built on a hybrid latent mixture-of-experts (MoE) architecture. On Monday, NVIDIA announced the NVIDIA Nemotron 3 family of open models, along with datasets and libraries designed to support the development of transparent, efficient multi-agent AI systems across a wide range of industries.
The Nemotron 3 lineup includes Nano, Super, and Ultra models built on a hybrid latent mixture-of-experts (MoE) architecture, which NVIDIA says is designed to lower inference costs, reduce context drift, and improve coordination among multiple AI agents.
Open innovation is the foundation of AI progress,” said NVIDIA founder and CEO Jensen Huang. “With Nemotron, we are transforming advanced AI into an open platform that provides developers with the transparency and efficiency needed to build agentic systems at scale.
“Among the three models, Nemotron 3 Nano is available immediately. The 30-billion-parameter model activates up to 3 billion parameters per task and is optimised for low-cost inference use cases such as software debugging, summarisation, and AI assistants. NVIDIA said Nemotron 3 Nano delivers up to four times higher token throughput than Nemotron 2 Nano while reducing reasoning token generation by up to 60%.
The model is available on Hugging Face and through inference providers such as Baseten, DeepInfra, Fireworks, FriendliAI, OpenRouter, and Together AI. It is also offered as an NVIDIA NIM microservice, enabling deployment on NVIDIA-accelerated infrastructure.
Nemotron 3 Nano will also be available on AWS through Amazon Bedrock, with support for additional cloud platforms expected in the coming months.
In contrast, Nemotron 3 Super is a roughly 100-billion-parameter model designed for low-latency, multi-agent applications, while Nemotron 3 Ultra, with approximately 500 billion parameters, is aimed at deep reasoning and long-horizon planning tasks.
Both Nemotron 3 Super and Ultra use NVIDIA’s 4-bit NVFP4 training format on Blackwell GPUs to reduce memory requirements, and are expected to become available in the first half of 2026.
The launch comes as companies increasingly move beyond single AI chatbots toward collaborative, agent-based systems in which multiple models work together to handle complex workflows.
According to NVIDIA, Nemotron 3 enables developers to route tasks between frontier proprietary models and open Nemotron models within the same workflow, helping balance reasoning capability with cost efficiency.
NVIDIA said the Nemotron 3 family also aligns with its sovereign AI strategy, enabling governments and enterprises to deploy models tailored to local data, regulations, and policy requirements. Organisations across Europe and South Korea are among those adopting the open models, the company added.
Several enterprise customers and partners including Accenture, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, Synopsys, and Zoom are integrating Nemotron models into AI workflows across manufacturing, cybersecurity, software development, and communications.
Perplexity CEO Aravind Srinivas said the company is using Nemotron within its agent-routing system to optimise performance. “We can direct workloads to fine-tuned open models such as Nemotron 3 Ultra, or use proprietary models when tasks require it,” he said.
Alongside the models, NVIDIA released three trillion tokens of pretraining, post-training, and reinforcement learning datasets, including an Agentic Safety Dataset for evaluating multi-agent systems. The company also open-sourced NeMo Gym, NeMo RL, and NeMo Evaluator to support the training, customisation, and evaluation of agentic AI.









