Parallel computer architectures, such as those deployed for modern high-performance computing (HPC) and artificial intelligence (AI) workloads, frequently encounter memory bottlenecks that limit performance, known as the ‘memory wall’.

Historically, this has stemmed from the advances in processor performance (such as CPUs, GPUs and other AI accelerators) outpacing developments in memory performance, writes Dr Shababa Selim, senior technology analyst at IDTechEx.

This poses significant challenges for data-intensive HPC and AI workloads, where bottlenecks lead to under-utilization of processors. IDTechEx’s report “Hardware for HPC, Data Centers, and AI 2025-2035: Technologies, Markets, Forecasts” critically assesses the key developments and trends in memory and storage technologies, including high-bandwidth memory (HBM), DDR, NAND, and many more.

 

Memory bottlenecks drive segmentation of memory hierarchy

Traditionally, the memory hierarchy consists of CPU/GPU cache (for example, SRAM), main memory (such as DRAM), and storage drives (like SSD, HDD).

However, HPC and AI workloads are driving a segmentation of this hierarchy due to memory wall issues and underutilisation of processors. This is fueling developments in in-package HBM, memory expansions through CXL-pooled memory, NAND SLC based storage memory solutions, and high-performance data storage solutions such as TLC and QLC NAND.
However, it is worth noting that further segmentation of memory hierarchy introduces layers that may also hinder high-speed operation due to increased data movement between layers.

 

HBM a key enabler of parallel processing HPC and AI

HPC uses parallelisation to break down problems across multiple nodes, requiring the fast transfer of large data volumes. HBM’s high bandwidth allows it to handle multiple memory requests simultaneously from various cores, making it essential for GPUs and accelerators.

It is no surprise then that the overwhelming majority of GPUs and accelerators use HBM to facilitate parallel workload processing. Additionally, continued development is pivotal to improving accelerator performance, with HBM4 offering a higher capacity and improved performance expected to launch in 2025.

HBM is essentially a 3D structure of vertically stacked DRAM dies on top of a logic die and relies on advanced packaging technologies such as through silicon vias (TSVs) and uses silicon interposer for interconnection with the processor.

The current generation HBM3E uses thermal compression with micro-bumps and underfills to stack DRAM dies however, manufacturers such as SK Hynix, Samsung and Micron are transitioning towards more advanced packaging technologies, such as copper-copper hybrid bonding for HBM4 and beyond to increase input/outputs, lower power consumption, improve heat dissipation, reduce electrode dimensions, and more.

 

Impact of geopolitics and outlook

Geopolitical tensions and US-China trade tensions are having a notable impact on the memory landscape. For instance, in September 2022, the US began restricting the export of high-end GPUs to China, with the ban expanding over time.

In response, Chinese firms like Huawei and Baidu have been stockpiling HBM modules from Samsung for AI applications according to a Reuters report.

There’s also a huge investment in China’s semiconductor industry. Additionally, China is heavily investing in its semiconductor industry.

While SK Hynix, Samsung, and Micron currently lead the HBM market, Chinese companies like Huawei, CXMT, and Tongfu Microelectronics are emerging, filing patents related to HBM and some even entering early-stage production.

Amid a booming HPC and AI hardware industry, IDTechEx forecasts a 15-fold increase in HBM unit sales for HPC by 2035, compared to 2024 levels.