News

‘We did it!’ yelled Pat Gelsinger

We got that sucker working—I knew we could.

Jon Peddie

The Aurora supercomputer at Argonne National Laboratory is now fully equipped with all 10,624 compute blades, boasting 63,744 Intel Data Center GPU Max series and 21,248 Intel Xeon CPU Max series processors.

Aurora is a collaborative project involving Intel, Hewlett Packard Enterprise (HPE), and the Department of Energy (DOE) and aimed at advancing high-performance computing (HPC) capabilities.

The supercomputer is designed to support simulations, data analytics, and artificial intelligence on a large scale. With over 1,024 storage nodes utilizing Intel’s Distributed Asynchronous Object Storage (DAOS) and leveraging the HPE Slingshot high-performance fabric, Aurora provides significant capacity and bandwidth. It is expected to achieve a theoretical peak performance of more than 2 exaFLOPS.

By employing the Intel Max series GPU and CPU product family, Intel says Aurora meets the demands of dynamic HPC and AI workloads, demonstrating superior performance compared to competitors. This advanced computing technology has the potential to address major challenges such as climate change and disease research, enabling scientists to push the boundaries of scientific exploration.

Researchers from the Aurora Early Science Program (ESP) and DOE’s Exascale Computing Project will migrate their work to Aurora, allowing them to scale their applications on the system. The supercomputer will undergo stress testing and bug resolution before full deployment, supporting the development of generative AI models for scientific purposes.

Intel fab
(Source: Intel)

Aurora’s advanced system comprises sleek rectangular blades that house processors, memory, networking, and cooling technologies. Each blade is equipped with two Intel Xeon Max series CPUs and six Intel Max series GPUs. The Xeon Max series CPUs have already demonstrated exceptional capabilities on Sunspot, a test bed with a similar architecture to Aurora. Developers are leveraging oneAPI and AI tools to accelerate HPC and AI workloads, while enhancing code portability across different architectures.

Installing those blades, says Intel, has been a meticulous process, requiring specialized machinery to precisely integrate each 70-pound blade vertically into Aurora’s refrigerator-sized racks. The system consists of 166 racks, accommodating 64 blades per rack across eight rows, occupying a space equivalent to two professional basketball courts in the Argonne Leadership Computing Facility (ALCF) data center.

Researchers participating in the ALCF’s Aurora Early Science Program (ESP) and DOE’s Exascale Computing Project will transfer their work from the Sunspot test bed to the fully deployed Aurora system. This migration will enable them to scale their applications on the complete system. Early users will conduct stress tests to identify and resolve potential bugs before the supercomputer’s deployment. These efforts include the development of generative AI models for scientific purposes, which were recently announced at the ISC 2023 conference.

More information can be found here