We need a new way to measure data centre energy efficiency

Data centres need an upgraded dashboard to guide their journey to greater energy efficiency, one that shows progress running real-world applications.

The formula for energy efficiency is simple: work done divided by energy used. Applying it to data centres calls for unpacking some details, writes Jeremy Rodriguez in an Nvidia blog.

Today’s most widely used gauge — power usage effectiveness (PUE) — compares the total energy a facility consumes to the amount its computing infrastructure uses. Over the last 17 years, PUE has driven the most efficient operators closer to an ideal where almost no energy is wasted on processes like power conversion and cooling.

PUE served data centres well during the rise of cloud computing, and it will continue to be useful. But it’s insufficient in today’s generative AI (GenAI) era, when workloads and the systems running them have changed dramatically.

That’s because PUE doesn’t measure the useful output of a data centre, only the energy that it consumes. That’d be like measuring the amount of gas an engine uses without noticing how far the car has gone.

Many standards exist for data centre efficiency. A 2017 paper lists nearly three dozen of them, several focused on specific targets such as cooling, water use, security and cost.

When it comes to energy efficiency, the computer industry has a long and somewhat unfortunate history of describing systems and the processors they use in terms of power, typically in watts. It’s a worthwhile metric, but many fail to realize that watts only measure input power at a point in time, not the actual energy computers use or how efficiently they use it.

So when modern systems and processors report rising input power levels in watts, that doesn’t mean they’re less energy efficient. In fact, they’re often much more efficient in the amount of work they do with the amount of energy they use.

Modern data centre metrics should focus on energy, what the engineering community knows as kilowatt-hours or joules. The key is how much useful work they do with this energy.

Here again, the industry has a practice of measuring in abstract terms, like processor instructions or math calculations. So, MIPS (millions of instructions per second) and FLOPS (floating point operations per second) are widely quoted.

Only computer scientists care how many of these low-level jobs their system can handle. Users would prefer to know how much real work their systems put out, but defining useful work is somewhat subjective.

Data centres focused on AI may rely on the MLPerf benchmarks. Supercomputing centres tackling scientific research typically use additional measures of work. Commercial data centres focused on streaming media may want others.

The resulting suite of applications must be allowed to evolve over time to reflect the state of the art and the most relevant use cases. For example, the last MLPerf round added tests using two generative AI models that didn’t even exist five years ago.

Ideally, any new benchmarks should measure advances in accelerated computing. This combination of parallel processing hardware, software and methods is running applications dramatically faster and more efficiently than CPUs across many modern workloads.

For example, on scientific applications, the Perlmutter supercomputer at the National Energy Research Scientific Computing Centre demonstrated an average of 5x gains in energy efficiency using accelerated computing. That’s why it’s among the 39 of the top 50 supercomputers — including the number one system — on the Green500 list that use Nvidia GPUs.

Because they execute lots of tasks in parallel, GPUs execute more work in less time than CPUs, saving energy.

Companies across many industries share similar results. For example, PayPal improved real-time fraud detection by 10% and lowered server energy consumption nearly 8x with accelerated computing.

The gains are growing with each new generation of GPU hardware and software.

In a recent report, Stanford University’s Human-Centred AI group estimated GPU performance “has increased roughly 7 000-times” since 2003, and price per performance is “5 600-times greater”.
Data centres need a suite of benchmarks to track energy efficiency across their major workloads.

Experts see the need for a new energy-efficiency metric, too.

With today’s data centres achieving scores around 1.2 PUE, the metric “has run its course,” says Christian Belady, a data centre engineer who had the original idea for PUE. “It improved data centre efficiency when things were bad, but two decades later, they’re better, and we need to focus on other metrics more relevant to today’s problems.”

Looking forward, “the holy grail is a performance metric. You can’t compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success,” says Belady, who continues to work on initiatives driving data centre sustainability.

Jonathan Koomey, a researcher and author on computer efficiency and sustainability, agrees. “To make good decisions about efficiency, data centre operators need a suite of benchmarks that measure the energy implications of today’s most widely used AI workloads.

“Tokens per joule is a great example of what one element of such a suite might be,” Koomey adds. “Companies will need to engage in open discussions, share information on the nuances of their own workloads and experiments, and agree to realistic test procedures to ensure these metrics accurately characterise energy use for hardware running real-world applications.”

“Finally, we need an open public forum to conduct this important work,” he says.

Thanks to metrics like PUE and rankings like the Green500, data centres and supercomputing centres have made enormous progress in energy efficiency.

More can and must be done to extend efficiency advances in the age of generative AI. Metrics of energy consumed doing useful work on today’s top applications can take supercomputing and data centres to a new level of energy efficiency.