Nvidia debuts Rubin CPX for massive-context inference

Nvidia has announced Nvidia Rubin CPX, a new class of GPU purpose-built for massive-context processing. This enables AI systems to handle million-token software coding and generative video with groundbreaking speed and efficiency.

Rubin CPX works hand in hand with Nvidia Vera CPUs and Rubin GPUs inside the new Nvidia Vera Rubin NVL144 CPX platform. This integrated Nvidia MGX system packs 8 exaflops of AI compute to provide 7,5x more AI performance than Nvidia GB300 NVL72 systems, as well as 100TB of fast memory and 1,7 petabytes per second of memory bandwidth in a single rack.

A dedicated Rubin CPX compute tray will also be offered for customers looking to reuse existing Vera Rubin NVL144 systems.

“The Vera Rubin platform will mark another leap in the frontier of AI computing — introducing both the next-generation Rubin GPU and a new category of processors called CPX,” says Jensen Huang, founder and CEO of Nvidia. “Just as RTX revolutionised graphics and physical AI, Rubin CPX is the first CUDA GPU purpose-built for massive-context AI, where models reason across millions of tokens of knowledge at once.”

Built on the Nvidia Rubin architecture, the Rubin CPX GPU uses a cost‑efficient, monolithic die design packed with powerful NVFP4 computing resources and is optimized to deliver extremely high performance and energy efficiency for AI inference tasks.

Rubin CPX delivers up to 30 petaflops of compute with NVFP4 precision for the highest performance and accuracy. It features 128GB of cost-efficient GDDR7 memory to accelerate the most demanding context-based workloads.

In addition, it delivers 3x faster attention capabilities compared with Nvidia GB300 NVL72 systems — boosting an AI model’s ability to process longer context sequences without a drop in speed.

Rubin CPX is offered in multiple configurations, including the Vera Rubin NVL144 CPX, that can be combined with the Nvidia Quantum‑X800 InfiniBand scale-out compute fabric or the Nvidia Spectrum-X Ethernet networking platform with Nvidia Spectrum-XGS Ethernet technology and Nvidia ConnectX-9 SuperNICs. Vera Rubin NVL144 CPX enables companies to monetise quickly, with $5-billion in token revenue for every $100-million invested.

Nvidia Rubin CPX will be supported by the complete Nvidia AI stack — from accelerated infrastructure to enterprise‑ready software. The Nvidia Dynamo platform efficiently scales AI inference, dramatically boosting throughput while cutting response times and model serving costs.

The processors will be able to run the latest in the Nvidia Nemotron family of multimodal models that provide reasoning for enterprise-ready AI agents. For production-grade AI, Nemotron models can be delivered with Nvidia AI Enterprise, a software platform that includes Nvidia NIM microservices as well as AI frameworks, libraries and tools that enterprises can deploy on Nvidia-accelerated clouds, data centres and workstations.