For many decades, when anybody was talking about a computer’s performance, it would have been taken for granted that they were referring to its speed. In the high-octane world of today’s tennis court-sized supercomputers, this is measured in FLOPS, or floating-point operations per second.
By Kevin Deierling, vice-president: marketing at NVIDIA networking business Unit
Defined by speed, the world’s highest performing computer right now is Japan’s Fugaku, operating at 415 petaFLOPS.
But there are problems with using speed as the only metric to compare one supercomputer to another. An arms race, based around FLOPS ratings, has seen the emergence of a generation of supercomputers that burn through colossal amounts of electricity, and give out so much heat as a by-product that hugely elaborate cooling systems must be deployed at all times to keep them from melting down.
An over-reliance on speed in the benchmarking of computers also downplays other vital qualities, such as reliability, availability and usability. And then there’s the economics of the thing. Making speed the primary measurement of success has seen the total cost of ownership of supercomputers hit unprecedented heights, while at the same time driving up their negative impact on the environment.
The Green500 offers a different and timely approach. An alternative to the speed-focused Top500 listing, it is a ranking of the 500 most energy-efficient supercomputers in the world, and was devised to increase awareness of other performance metrics than just FLOPS. It achieves this by ranking computers according to FLOPS per watt, as well as taking into account energy efficiency and reliability. The Green500 exists also, as its name would imply, to promote the importance of environmental credentials to the various stakeholders and investors in the supercomputer sector.
The top placings in the Green500 reveal some interesting things about where supercomputing is going, and indicate why efficiency may be taking over as the number one determinant of what sorts the leaders from the followers.
Top of the Green500 is the MN-3 system, built by Japanese start-up Preferred Networks and coming in at 21.1 gigaFLOPS per watt. Measured by speed, however, MN-3 sits in 394th place in the Top500 list. It is not in commercial use, being only available for its maker’s own R&D programme.
In second place is Selene, an AI supercomputer made by NVIDIA. Selene delivers 20.52 gigaFLOPS per watt, putting it in a comparable bracket to MN-3 for efficiency, but it scores 7th on the Top500 list, so it’s pretty rapid too.
Selene is based around a unique type of open infrastructure, the DGX SuperPOD. Designed and built in just a couple of weeks, the DGX SuperPOD combines NVIDIA’s DGX processor design with an AI networking fabric from Mellanox.
It’s this configuration that gives Selene performance, efficiency and economy, as well as flexibility in terms of the variety of uses it can be put to. NVIDIA’s intention with Selene was to create a supercomputer-class system powerful enough to train and run its own AI models, for use in fields such as autonomous vehicles, but flexible enough to be part of just about any academically-led deep learning research project.
Since its deployment, Selene has run thousands of jobs a week, often simultaneously. It conducts AI data analytics, traditional machine learning, and HPC applications. The power of the DGX SuperPOD
is in use with companies such as Continental in the automotive sector, Lockheed Martin in aerospace and Microsoft in cloud computing.
Between them, the top machines on the Green500 list point to a new direction in supercomputing, combining vastly superior cost of ownership compared with traditional alternatives with a design that makes them the right fit for tomorrow’s top-level AI challenges.
AI, powered by these machines, is transforming the planet and every aspect of life as we know it. Organizations that want to be in the vanguard of this AI-powered world understand the need for compute power that offers unprecedented scale as well as ease and speed of deployment.
The supercomputer of the future needs to be equally at home in an environmentally conscious data centre, running HPC tasks, as it is with AI research companies looking for machines that are big, fast and fit for purpose. In either case, there is no appetite any more for expensive and time-consuming custom builds with their complex interconnect trade-offs. With tomorrow’s open architecture there are no more proprietary designs that take months to commission.
Modern compute needs machines that serve multiple uses and have long lifetimes, packing as much processing, memory and storage as possible into the smallest space with the least possible energy consumption. With the best of the Green500, this is now a reality.