News

Rochan Sankar joins Nvidia networking team

The culmination of a long-standing working relation.

Shawnee Blackwood

Reuters reports Nvidia hired Enfabrica co-founder/CEO Rochan Sankar and licensed its networking tech in a cash-and-stock deal exceeding $900 million. The transaction brings Sankar to lead AI infrastructure networking. Nvidia previously invested $125 million in 2023. The acqui-hire adds ACF SuperNIC fabric that unifies NIC, switching, and CXL memory pooling to scale GPU clusters, reduce latency, raise utilization, and extend Nvidia’s roadmap.

Huang and Sankar

(L) Jensen Huang, CEO of Nvidia, and (R) Rochan Sankar, CEO of Enfabrica. (Source: Companies)

According to a new report from Reuters, Nvidia hired Enfabrica co-founder and CEO Rochan Sankar and licensed Enfabrica’s networking technology in a cash-and-stock transaction exceeding $900 million. The deal closed, and Sankar joined Nvidia to lead AI infrastructure and networking. Nvidia already invested $125 million in Enfabrica’s Series B round in 2023, so the companies entered this agreement with an established technical and financial relationship. Nvidia structured the transaction as a talent-and-technology acquisition, not a corporate purchase, to secure leadership and intellectual property that fit directly into its platform roadmap.

Sankar now drives fabric architecture for training and inference clusters. He aligns Enfabrica’s accelerated compute fabric with DGX platforms, Grace CPU systems, and NVLink topologies, and he coordinates with partners to validate deployment rules for cloud-scale installations. He defines milestones around bandwidth, latency, congestion control, and memory tiering, and he ties those goals to model behavior and runtime schedules.

Enfabrica’s core product, the Accelerated Compute Fabric SuperNIC (ACF-S), consolidates I/O roles that conventional deployments split across PCIe switches, NICs, and top-of-rack switching. The device establishes a direct, low-latency path among GPUs, CPUs, host memory, pooled memory, and the network. A software control plane supervises the placement and movement of tensors and activations, reduces hop count, and prevents queue buildup under bursty phases such as prefill and all-reduce. Operators set policies for flow steering, queue management, and congestion avoidance without rewriting model code.

The platform bridges high-performance Ethernet with memory pooling over CXL. GPUs read and write shared DDR5 DRAM in addition to local HBM, which decouples memory capacity from accelerator count. Operators scale memory independently, place key-value caches in pooled DRAM, and reserve HBM for hot activations and compute-dense tensors. This approach limits stranded resources and raises utilization across accelerators, DRAM, and storage.

Enfabrica’s technology targets the main barrier to cluster scaling: coordinated bandwidth, latency, and memory access across many nodes. The fabric connects very large GPU clusters—potentially over 100,000 devices—so they operate as one computer with predictable latency and deterministic access to caches and parameter shards. That capability addresses the bottlenecks that emerge when model size and batch shape push beyond a single server or a single rack.

The stack includes Resilient Message Multipathing. The fabric monitors link health, selects alternative routes on fault, and preserves ordering when the model or runtime requires it. Telemetry exposes packet drops, queue depths, and flow completion time so schedulers react before jobs stall. With those controls, operators scale within a server and across racks under a single control plane and mix scale-up and scale-out topologies without re-architecting data movement.

Large-scale inference, retrieval-augmented generation, parameter-efficient fine-tuning, and mixture-of-experts training benefit directly from these capabilities. Token generators and cache-heavy retrieval layers need bandwidth and low-jitter latency. Expert routing and all-to-all exchange need precise placement rules and queue control. ACF-S meets those requirements by collapsing discrete I/O components into a fabric endpoint that understands both network paths and memory tiers.

Nvidia gains three concrete advantages. First, it adds a production-oriented I/O architecture that complements NVLink inside a server by solving inter-server scale over Ethernet. Second, it integrates memory tiering that shifts part of the activation or cache footprint to pooled DDR5 when latency and cost targets allow. Third, it brings in a leadership team with deep experience in fabric design, heterogeneous memory control, and host-device orchestration.

The move extends Nvidia’s platform strategy from accelerators and proprietary interconnects to full data center subsystems. Nvidia can publish reference designs that specify link budgets, traffic classes, and placement policies. It can document coherency behavior between HBM and pooled DRAM, define scheduler hooks for context movement, and standardize telemetry for congestion and queue depth. Partners can then qualify racks that deliver consistent networking, memory pooling, and observability across generations.

Nvidia can also streamline system integration. ACF-style endpoints can sit on server trays next to GPUs and CPUs, reduce device count per node, and simplify cabling in dense racks. Architects can introduce memory capacity without adding accelerators and can right-size NIC bandwidth to workload mix. The control plane can coordinate with inference services and job schedulers to sequence prefill and decode phases, throttle background tasks, and maintain service-level objectives under demand spikes.

The transaction follows an industry pattern that pairs licensing with leadership hires to accelerate roadmaps. Nvidia and Enfabrica now align around fabric-centric scaling. Sankar directs integration that targets throughput, latency, and utilization at cluster scale, and he sets a release plan that brings these capabilities into production systems. The combined roadmap treats networking and memory as first-class resources and ties them to compiler flows, runtime scheduling, and model telemetry so operators scale AI clusters with predictable performance.

LIKE WHAT YOU’RE READING? INTRODUCE US TO YOUR FRIENDS AND COLLEAGUES.