Nvidia Rubin CPX and disaggregated long-context inference

Massive context and the inference dichotomy.

September 11, 2025

Jon Peddie

Nvidia introduced the Rubin CPX GPU, designed for the compute-intensive phase of AI inference that processes millions of tokens. The processor offers 30 PFLOPS of NVFP4 performance and 128GB of GDDR7 memory, optimizing cost and power efficiency over bandwidth. The Vera Rubin NVL144 CPX platform combines 144 Rubin CPX GPUs with standard Rubin GPUs and Vera CPUs in a disaggregated architecture. This setup separates context processing from generation phases, providing 8 EFLOPS of compute power. Expected availability is late 2026. Nvidia Rubin CPX GPU. (Source: Nvidia) Nvidia has launched a new processor class, the Rubin CPX, a GPU built specifically

...

Enjoy full access with a TechWatch subscription!

TechWatch is the front line of JPR information gathering service, comprising current stories of interest to the graphics industry spanning the core areas of graphics hardware and software, workstations, gaming, and design.

A subscription to TechWatch includes 4 hours of consulting time to be used over the course of the subscription.

Already a subscriber? Login below

Nvidia Rubin CPX and disaggregated long-context inference

Enjoy full access with a TechWatch subscription!

This content is restricted

Login