With artificial intelligence quickly evolving, enterprise data centres are increasingly challenged to manage the heat generated by high-performance systems.
By Michael Langeveld, head of technology and business development at HPE Emirates and Africa
This issue is particularly critical for South African enterprises due to the country’s fluctuating Energy Availability Factor (EAF). The EAF, which measures energy production compared to capacity, declined steadily from 2018 to 2023, raising concerns about reliable power supply.
Though this started to improve in March 2024 with reduced load shedding, energy-efficient solutions remain a vital priority.
Traditionally, data centres have relied heavily on air cooling to manage server temperatures. This method, while effective for many years, is becoming inadequate as CPU power requirements soar and resulting server densities increase.
Cool under pressure
According to a PwC survey of South African CEOs, more than half of respondents expect Generative AI to have a greater impact on their business in the next three years. As more intense workloads have grown, driven by the rapid growth of AI, so too have the power demands of the chips that drive them.
Over the past several years, the maximum power of highend CPUs and GPUs has increased dramatically — from around 200 watts to upwards of 300 watts for CPUs and 500 watts for GPUs. We’re even on the brink of seeing GPUs that exceed 1 000 watts. This rise in power density has outstripped the capabilities of traditional air cooling, pushing the industry toward more advanced solutions.
But it’s not just the power levels that are changing; the thermal tolerance of these chips is also shifting. In the past, chips could operate at temperatures as high as 90 to 100 degrees Celsius without issue. However, today’s chips, with their more intricate designs and higher component density, are less tolerant of such high temperatures. Some chips now have a maximum operating temperature as low as 60 degrees Celsius.
Liquid cooling, once reserved for the most demanding high-performance computing environments, is now making its way into the broader enterprise sector. Unlike air cooling, which uses fans to move heat away from heat sinks on components, liquid cooling pumps coolants through cooling plates on the processors and dissipates heat to liquid. This method not only allows for more efficient heat transfer but also enables higher powered processors in servers, thereby boosting overall performance.
The science of next-gen cooling
A new technique leverages the strengths of both air and liquid cooling systems. In this approach, liquid cooling is used to manage the most heat-intensive components, such as CPUs and GPUs, while air cooling handles the remaining components. The combination allows data centres to balance their rack-level cooling efficiency and cost effectiveness.
One of the key advantages of liquid cooling is its ability to support higher server densities without requiring a significant increase in physical space. With liquid cooling, data centres can fully populate racks that might otherwise be underutilised due to cooling constraints. This not only reduces the overall footprint of the data centre but also allows for more flexible and scalable infrastructure design.
Combination air-liquid cooling systems are also more energy efficient. By reducing reliance on air cooling, data centres can lower their overall power consumption at the rack level. Our internal comparison between air cooled and liquid cooled servers showed nearly a 15% decrease in chassis power consumption with liquid cooling.
This reduction in energy usage, alongside other conservation techniques, translates directly into cost savings and a lower carbon footprint — a crucial consideration for the high majority of South African CEOs who are actively in the process of trying to reduce their energy consumption.
The benefits of a combined approach to cooling extend beyond energy efficiency. By enabling higher performance per megawatt, the technique allows data centres to get more out of their existing power infrastructure. In a world where performance is often constrained by power availability, maximising output without additional energy expenditure is a game changer.
Green gains and cost cuts
Beyond the immediate performance and cost advantages, new cooling systems offer significant environmental benefits. Moving to liquid cooling could reduce carbon dioxide emissions from 21,444 tons to just 3,485 tons per year for the same 10,000-server configuration.[1] This significant reduction highlights the potential for large-scale environmental impact, aligning with corporate sustainability goals.
Hewlett Packard Enterprise has advanced this new cooling approach with its Adaptive Cascade Cooling method. This patented technology balances air and liquid cooling loads in real time based on thermal demands. The system optimises energy use and ensures each component receives the most efficient cooling method at any given moment.
By using air cooling for lower density racks and liquid cooling for higher density equipment, this approach reduces infrastructure complexity, leading to significant cost savings. By minimising pipes and cutting pump power consumption, the system boosts energy efficiency.
It also takes advantage of ambient outdoor air for free cooling, which further lowers energy use and supports sustainability goals. Additionally, it allows for flexible adjustments to cooling capacities, enabling data centres to adapt to changing IT demands.
The road ahead
In the race to support AI’s insatiable appetite for computational power, data centres must evolve to become more efficient, sustainable, and capable of handling the demands of tomorrow’s workloads. Advanced cooling represents a crucial step in this evolution, offering a balance of performance, cost savings, and environmental responsibility.
For local enterprise technology leaders, the message is clear: Investing in new cooling technologies isn’t just a smart move but a necessary one. As the industry moves toward a future where AI-driven workloads increase power consumption, the ability to efficiently manage heat will be a defining factor in the success of any AI strategy.
[1] Calculation based on .85 pounds of carbon per kWh