News

Super collaboration supercomputer—Intel, Nvidia, Dell, and others

It takes a village to build a Cardinal.

Jon Peddie

The Cardinal HPC cluster is a collaborative effort among Intel, Dell Technologies, Nvidia, and the Ohio Supercomputer Center designed to meet Ohio’s increasing demand for HPC resources, particularly in AI. With hardware capable of accommodating AI workloads, it boasts Dell PowerEdge servers and Intel Xeon CPU Max series, optimized for memory-bound tasks. Featuring Nvidia Hopper GPUs with HBM2e memory and Nvidia Magnum IO GPUDirect, it offers peak AI performance and enhanced computing capabilities. It is supported by the Ohio Department of Higher Education.

Cardinal Supercomputer
Frank Bartik, Ohio Supercomputer Center data center technician, works on test nodes for the Cardinal cluster with Intel Xeon CPU Max series scheduled for launch later in 2024. (Source: Ohio Supercomputer Center)

What do we think? It’s difficult to understand why Intel couldn’t persuade OSC to use Intel’s Max series GPUs. Granted, they are not as powerful as Nvidia’s RTX H100s, but they cost less, and it gives OSC a single vendor to deal with. It just shows how powerful Nvidia’s reputation as well as their GPUs are in the AI world. That’s going to a big hurdle for any other company hoping to penetrate the market.

Collaborative tech effort hatches Cardinal supercomputer cluster

This week, a collaborative effort involving Intel, Dell Technologies, Nvidia, and the Ohio Supercomputer Center (OSC) led to the unveiling of Cardinal, a state-of-the-art high-performance computing (HPC) cluster.

Cardinal is designed specifically to cater to the escalating need for HPC resources in Ohio across various sectors including research, education, and industry innovation, with a particular focus on artificial intelligence. AI and machine learning are becoming indispensable tools across scientific, engineering, and biomedical domains, facilitating the resolution of intricate research queries. Their efficacy has extended to academic fields like agricultural sciences, architecture, and social studies, promising the potential for advancement.

Cardinal boasts hardware capable of accommodating the burgeoning AI workloads, representing a significant leap forward from its predecessor, the Owens cluster, launched in 2016, both in terms of capabilities and capacity.

The Cardinal cluster is a heterogeneous system that employs Dell PowerEdge servers and the Intel Xeon CPU Max series with high-bandwidth memory as its cornerstone. This setup is optimized for effectively handling memory-bound HPC and AI tasks, while also promoting programmability, portability, and ecosystem adoption.

Intel and Dell claim it offers:

  • Enhanced memory capacity and bandwidth to support memory-intensive workloads.
  • Increased computational power for faster processing of complex calculations.
  • Improved efficiency in managing AI algorithms and models.
  • Greater flexibility and scalability to accommodate evolving research needs.
  • Enhanced collaboration and resource sharing capabilities for multi-disciplinary projects.
  • Advanced security features to safeguard sensitive data and intellectual property.
  • Streamlined deployment and management processes to minimize downtime and optimize performance.

Thirty-two nodes will have 104 cores, 1TB of memory, and four Nvidia Hopper architecture-based Nvidia H100 Tensor Core GPUs with 94GB HBM2e memory interconnected by four NVLink connections. Nvidia Quantum-2 InfiniBand also provides 400 Gbps of networking performance with low latency to deliver 500 PFLOPS of peak AI performance (FP8 Tensor Core, with sparsity) for large, AI-driven scientific applications. Sixteen nodes will have 104 cores, 128GB HBM2e, and 2TB DDR5 memory for large symmetric multiprocessing (SMP)-style jobs.

With Cardinal, OSC will be one of the first HPC centers to deploy HBM technology on a large scale. The new cluster will feature Intel chips with HBM, which will provide greater computing performance. In addition, Cardinal will utilize Nvidia Magnum IO GPUDirect Remote Direct Memory Access, which will allow clients to more effectively utilize the cluster’s GPUs, especially for AI workloads.

And it has :

  • 756 Max series CPU 9,470 processors, which will provide 39,312 total CPU cores.
  • 128GB of HBM2e and 512GB of DDR5 memory per node.

With a single software stack and traditional programming models on the x86 base, the cluster will more than double OSC’s capabilities while addressing broadening use cases and allowing for easy adoption and deployment.

Cardinal, named in honor of the state bird of Ohio, reflects the state’s ongoing commitment, supported by the Ohio Department of Higher Education (ODHE), to ensure that Ohio academic and industry researchers can access the most advanced technologies in supercomputing.

Cardinal will join OSC’s existing clusters Pitzer (2018, expanded in 2020) and Ascend (2022) on the OSC data center floor at the State of Ohio Computer Center. The Owens cluster will remain in service for three months after Cardinal starts production.

Cardinal supercomputer
Is it the red or the black wire that needs to be cut? (Source: Ohio Supercomputer Center)