News

A beautiful machine

Can do a single arithmetic operation every second for the next 50 billion years.

Jon Peddie

The US has had an inconsistent relationship with supercomputer systems due to politicians’ changing commitments and objectives. This affects long-term planning, making it unpredictable. Despite this, El Capitan, conceived during the Obama administration and funded in 2019, has achieved 1.742 EFLOPS and is not yet running at max efficiency. Lawrence Livermore Lab and partners aid design and specification due to necessity. They collaborate with HPE, Cray, Dell, AMD, Intel, and Nvidia. El Capitan serves the NNSA Tri-Labs, focusing on nuclear warhead modeling and AI applications. Its companion system, Tuolumne, supports open science initiatives like climate modeling and biosecurity research.

Lawrence Livermore National Lab
(Source: Lawrence Livermore National Lab)

The US has had a rocky relationship with super-duper computer systems—maybe inconsistent would be a better description. That’s because politicians are in the procurement chain, and they change their commitments and objectives at about the same rate they change their underwear. As a result, the long-range (measured in half-decade units) planning needed to stay at the head of the pack is inconsistent and unpredictable; El Capitan was conceived during the Obama administration and miraculously funded in 2019. Agencies like DOE and NSA, which help design and specify supercomputers, know they have to stay flexible and get the design up to the very last second so it’s not obsolete before it’s built. The Lawrence Livermore team did that, and when El Capitan came online a few weeks ago, their vision proved they were right—the machine hit 1.742 EFLOPS, and it’s not running at max efficiency yet.

Lawrence Livermore National Lab (LLNL) and its sister labs at Oak Ridge, as well as several universities, are in the research and science business and have gotten involved in the design and specification of supercomputers out of necessity, not as part of their main charter. They are happily aided and abetted by companies like HPE, Cray, and Dell, and semiconductor suppliers like AMD, Intel, and Nvidia. It’s a symbiotic partnership between national labs and universities, and system and component vendors.

Diagram
(Source: Lawrence Livermore Lab)

And sitting in the bleachers watching it all play out are the scorekeepers, the top 500 accountants who tabulate test results. El Capitan jumped to the top when it started releasing test results, and it’s been there for a while now. Usually, a computer holds the top spot for about six months—it looks like El Captain could hold it for a year or longer.

The LLNL senior team held a press conference on Sunday to announce that the machine, as they refer to it, was officially online and working as directed. Corey Hinderstein, the acting principal deputy administrator of the National Nuclear Security Administration (NNSA), described it as a “beautiful machine,” and all her co-workers, plus Forrest Norrod, executive vice president and GM of AMD’s Datacenter and Embedded Systems Group, immediately adopted her description.

El Capitan, the first exascale-class machine for the NNSA, is the premiere computing resource for the NNSA Tri-Labs—LLNL, Los Alamos, and Sandia National Laboratories.

The El Capitan supercomputer, housed at Lawrence Livermore Lab, is powered by AMD Instinct MI300A APUs and built by Hewlett Packard Enterprise (HPE). It’s a tour de force of technology that is summed up in the following slide.

The compute nodes are a custom AMD Instinct MI300 APU built with chiplets.

While it will be one of the world’s most energy-efficient supercomputers, LLNL’s El Capitan is expected to use around 30–35MW of power at peak capacity, which is enough to power a midsize city. It will also use 28 tons of cooling (an average-size home will use 3 tons).

The Machine will be employed initially to model aspects of nuclear warheads and simulate the ignition of those doomsday devices. But other applications, including AI, will be run on a priority basis.

Bronis R. de Supinski, LLNL’s chief technology officer for Livermore Computing, adds, “Leveraging the AMD Instinct MI300A APUs, we’ve built a system that was once unimaginable, pushing the absolute boundaries of computational performance while maintaining exceptional energy efficiency. With AI becoming increasingly prevalent in our field, El Capitan allows us to integrate AI with our traditional simulation and modeling workloads, opening new avenues for discovery across various scientific disciplines.”

LLNL and other NNSA Tri-Labs are utilizing El Capitan and its companion system, Tuolumne, to enhance AI and machine learning-assisted data analysis, advancing LLNL’s AI-driven objectives. El Capitan will focus on applying AI to complex problems like inertial confinement fusion research, while Tuolumne will support unclassified open science initiatives, including climate modeling, biosecurity, drug discovery, and earthquake modeling.

El Capitan’s capabilities will significantly contribute to the nation’s nuclear stockpile safety, security, and reliability, ensuring effective design and stewardship. The system will also support new mission areas, such as materials discovery, high-energy-density physics, nuclear data, and conventional weapons design.

AMD and HPE collaborated on Frontier, the first exascale supercomputer, located at Oak Ridge National Lab. Powered by AMD Epyc CPUs and AMD Instinct GPUs, Frontier provides 1.35 EFLOPS of performance, making it a leading computational resource. Researchers utilize Frontier to address complex scientific challenges, including climate modeling, biomedical research, and training large language models, contributing to advancements in scientific discovery and AI breakthroughs.

AMD Instinct MI300A APUs will also power a next-generation supercomputer system for Japan’s National Institutes for Quantum Science and Technology (QST). The system, built by NEC Corporation, will use 280 AMD Instinct MI300A APUs to drive AI and scientific research for QST and the National Institute for Fusion Science.

Additional information on El Capitan, Tuolumne, and RZAdams, a recently deployed unclassified system at LLNL, can be found here and here.

You can expect to hear more about these machines at the upcoming supercomputer conference in Atlanta  November 17–22.