Intel has showcased AI-accelerated high performance computing (HPC) with leadership performance for HPC and AI workloads across Intel Data Centre GPU Max Series, Intel Gaudi2 AI accelerators and Intel Xeon processors.
In partnership with Argonne National Laboratory, Intel shared progress on the Aurora generative AI (genAI) project, including an update on the 1-trillion parameter GPT-3 LLM on the Aurora supercomputer that is made possible by the unique architecture of the Max Series GPU and the system capabilities of the Aurora supercomputer.
Intel and Argonne demonstrated the acceleration of science with applications from the Aurora Early Science Program (ESP) and the Exascale Computing Project at Supercomputer 2023. The company also showed the path to Intel Gaudi3 AI accelerators and Falcon Shores.
“Intel has always been committed to delivering innovative technology solutions to meet the needs of the HPC and AI community,” says Deepak Patil, Intel corporate vice-president and GM of Data Centre AI Solutions. “The great performance of our Xeon CPUs along with our Max GPUs and CPUs help propel research and science. That, coupled with our Gaudi accelerators, demonstrate our full breadth of technology to provide our customers with compelling choices to suit their diverse workloads.”
Argonne National Laboratory shared progress on its generative artificial intelligence (GenAI) for science initiatives with the Aurora supercomputer. The Aurora genAI project is a collaboration with Argonne, Intel and partners to create foundational AI models for science. The models will be trained on scientific texts, code and science datasets at scales of more than 1-trillion parameters from diverse scientific domains.
Using the foundational technologies of Megatron with DeepSpeed, the GenAI project will service multiple scientific disciplines, including biology, cancer research, climate science, cosmology and materials science.
The distinctive Intel Max Series GPU architecture and the Aurora supercomputer system capabilities can efficiently handle 1 trillion-parameter models with just 64 nodes, far fewer than would be typically required. Argonne National Laboratory ran four instances on 256 nodes, demonstrating the ability to run multiple instances in parallel on Aurora, paving the path to scale the training of trillions of parameter models more quickly with trillions of tokens on more than 10 000 nodes.
Intel and Argonne National Laboratory demonstrated the acceleration of science at scale enabled by the system capabilities and software stack on Aurora.1 Workload examples include:
* Brain connectome reconstruction is enabled at scale with Connectomics ML, showing competitive inference throughput on more than 500 Aurora nodes.
* General Atomic and Molecular Electronic Structure System (GAMESS) showed over 2x competitive performance with Intel Max GPU compared to the Nvidia A100. This enables the modeling of complicated chemical processes in drug and catalyst design to unlock the secrets of molecular science with the Aurora supercomputer.
* Hardware/Hybrid Accelerated Cosmology Code (HACC) has demonstrated runs on more than 1,500 Aurora nodes, enabling the visualization and understanding of the physics and evolution of the universe.
* The drug-screening AI inference application, part of the Aurora Drug Discovery early science project (ESP), enables efficient screening of vast chemical datasets by enabling the screening of more than 20-billion of the most synthesised compounds on just 256 nodes.
Intel also showed new HPC and AI performance, as well as software optimisations across hardware and applications:
* Intel and Dell published results for STAC-A2, an independent benchmark suite based on real-world market risk analysis workloads, showing great performance for the financial industry. Compared to eight Nvidia H100 PCIe GPUs, four Intel Data Centre GPU Max 1550s had 26% higher warm Greeks 10-100k-1260 performance and 4,3x higher space efficiency.
* The Intel Data Centre GPU Max Series 1550 outperforms Nvidia H100 PCIe card by an average of 36% (1,36x) on diverse HPC workloads.
* Intel Data Centre GPU Max Series delivers improved support for AI models, including multiple large language models (LLMs) such as GPT-J and LLAMA2.
* Intel Xeon CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor.
* Last week, MLCommons published results of the industry standard MLPerf training v3.1 benchmark for training AI models. Intel Gaudi2 demonstrated a significant 2x performance leap with the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark.
* 5th Gen Intel Xeon processors will deliver up to 1,4x higher performance gen-over-gen on HPC applications as demonstrated by LAMMPS-Copper. Granite Rapids, a future Intel Xeon processor, will deliver increased core count and built-in acceleration with Intel(r) Advanced Matrix Extensions and support for multiplexer combined ranks (MCR) DIMMs. Granite Rapids will have 2,9x better DeepMD+LAMMPS AI inference. MCR achieves speeds of 8 800 megatransfers per second based on DDR5 and greater than 1,5Tbps of memory bandwidth capability in a two-socket system, which is critical for feeding the fast-growing core counts of modern CPUs and enabling efficiency and flexibility.
Intel has also announced features for its 2024 software development tools that advance open software development powered by oneAPI multiarchitecture programming.
New tools help developers extend new AI and HPC capabilities on Intel CPUs and GPUs with broader coverage, including faster performance and deployments using standard Python for numeric workloads, and compiler enhancements delivering a near-complete SYCL 2020 implementation to improve productivity and code offload.
Additionally, Texas Advanced Computing Centerre (TACC) announced its oneAPI Centre of Excellence will focus on projects that develop and optimize seismic imaging benchmark codes. Intel fosters an environment where software and hardware innovation and research advance the industry, with 32 oneAPI Centres of Excellence worldwide.