Kathy Gibson reports – One of Europe’s biggest supercomputers, Leonardo ranks number 9 on the latest global Top500 list.

When the CINECA data centre in Bologna, Italy, went live in 2022, the Leonardo petascale supercomputer was the fourth-largest in the world.

The system consists of an Atos BullSequana XH2000 computer running Red Hat Enterprise Linux, with close to 14 000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity.

Built at a cost of about €240-million, Leonardo has 2,8 petabytes of memory and can run at a peak of 250 petaFLOPS (floating point operations per second), with more than 100 petabytes of storage capacity.

Massimo Alessio Mauri, facility manager at Cineca, explains that Leonardo provides solutions and services for European universities, research centres and government departments. It can address complex computational workflows involving high-performance computing (HPC), artificial intelligence (AI), high throughput and visualisation applications.

The centre is located in the Bologna Technopole, which aims to become one of the main European hubs for computing and data processing.

The building that houses Leonardo was built in 1852 as a tobacco processing plant. At the time it symbolised Italy’s post-war architecture and its redevelopment as a technopole aims to preserve this.

The technopole consists of 1 240 square metres of computing room space, and 900 square metres of ancillary space.

Leonardo’s room is made of reinforced concrete roughly a 32m x 23m rectangle. It was designed to ensure maximum resistance to seismic events and to retain structural strength in the event of fire.

The first floor of the room contains four power stations dedicated to transforming and distributing electricity to the supercomputer, which is housed on the ground floor.

The first of Leonardo’s eight rows is air-cooled, and the rest are water-cooled, with the basement containing four independent tunnels for cooling the water.

“Sustainability is a core principle,” says Alessio. “So we use advanced cooling techniques to cool warm water with adiabatic dry coolers that are energy efficient. This reduces energy consumption by avoiding traditional refrigeration cycle and using a 10km closed-circuit piping system for optimal temperature control.”

Vertiv provides power and cooling solutions for Leonardo.

 

Leonardo technical details

The booster module partition is based on BullSequana XH2135 supercomputer nodes, each with four Nvidia Tensor Core GPUs and a single Intel CPU. The data-centric partition is based on BullSequana X2140 three-node CPU Blade and is equipped with two Intel Sapphire Rapids CPUs, each with 56 cores.
The overall system also uses Nvidia Mellanox HDR 200Gb/s InfiniBand connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability.

The system’s 4 992 computing nodes are organised as follows:

  • Booster partition – a BullSequana X2135 “Da Vinci” single node GPU Blade; 3 456 nodes; single socket 32-core Intel Xeon Platinum 8358 CPU, 2.60GHz (Ice Lake) processors; 110 592 cores; 8×64 GB DDR4 3200 MHz (512 GB) of RAM; 4x Nvidia custom Ampere A100 GPU 64GB HBM2e, NVLink 3.0 (200GB/s) accelerators; and 2 x dual port HDR100 per node (400Gbps/node) for networking.
  • Data Centric General Purpose partition – a BullSequana X2140 three-node CPU Blade; 1 536 nodes; Intel Saphire Rapids 2×56 cores, 2.0 GHz processors; 172 032 nodes (112 cores/node); 512 (16×32) GB DDR5 4800 MHz of RAM; Nvidia HDR cards 1x100Gbps/node for networking.
  • 16 visualisation nodes  2 x Icelake ICP06 32cores 2.4GHz, 3 Nvidia Tesla V100, RAM: (16 x 32) GB DDR5 4800 MHz
  • 137.6 PB (raw) Large capacity storage, 620 GB/s
  • High Performance Storage: 5.7 PB, 1.4 TB/s based on 31 x DDN Exascaler ES400NVX2
  • Login and Service nodes: 16 Login nodes are available. 16  service nodes for I/O and cluster management.

All the nodes are interconnected through an Nvidia Mellanox network, with Dragon Fly+, capable of a maximum bandwidth of 200Gbit/s between each pair of nodes.