Intel has shown up well on competitive performance for AI inference as it attempt to make artificial intelligence (AI) more accessible at scale across the continuum of AI workloads, from client and edge to the network and cloud.
MLCommons has published results of its MLPerf Inference v3.1 performance benchmark for GPT-J, the 6-billion parameter large language model, as well as computer vision and natural language processing models.
Intel submitted results for Habana Gaudi2 accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series.
“As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers’ needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models – from the smallest to the largest – with leading price/performance,” says Sandra Rivera, Intel executive vice-president and GM of the Data Centre and AI Group.
She adds that Intel’s AI products give customers flexibility and choice when choosing an optimal AI solution based on their own respective performance, efficiency and cost targets, while helping them break from closed ecosystems.
About the Habana Gaudi2 results
The Habana Gaudi2 inference performance results for GPT-J provide strong validation of its competitive performance.
* Gaudi2 inference performance on GPT-J-99 and GPT-J-99.9 for server queries and offline samples are 78,58 per second and 84,08 per second, respectively.
* Gaudi2 delivers compelling performance versus Nvidia’s H100, with H100 showing a slight advantage of 1,09x (server) and 1,28x (offline) performance relative to Gaudi2.
* Gaudi2 outperforms Nvidia’s A100 by 2,4x (server) and 2x (offline).
* The Gaudi2 submission employed FP8 and reached 99,9% accuracy on this new data type.
With Gaudi2 software updates released every six to eight weeks, Intel expects to continue delivering performance advancements and expanded model coverage in MLPerf benchmarks.
About the Intel Xeon results
Intel submitted all seven inference benchmarks, including GPT-J, on 4th Gen Intel Xeon Scalable processors. These results show great performance for general-purpose AI workloads, including vision, language processing, speech and audio translation models, as well as the much larger DLRM v2 recommendation and ChatGPT-J models.
Additionally, Intel remains the only vendor to submit public CPU results with industry-standard deep learning ecosystem software.
The 4th Gen Intel Xeon Scalable processor is ideal for building and deploying general-purpose AI workloads with the most popular AI frameworks and libraries. For the GPT-J 100-word summarization task of a news article of approximately 1 000 to 1 500 words, 4th Gen Intel Xeon processors summarized two paragraphs per second in offline mode and one paragraph per second in real-time server mode.
For the first time, Intel submitted MLPerf results for Intel Xeon CPU Max Series, which provides up to 64 gigabytes (GB) of high-bandwidth memory. For GPT-J, it was the only CPU able to achieve 99.9% accuracy, which is critical for applications for which the highest accuracy is of paramount performance.
Intel collaborated with its original equipment manufacturer (OEM) customers to deliver their own submissions, further showcasing AI performance scalability and wide availability of general-purpose servers powered by Intel Xeon processors that can meet customer service level agreements (SLAs).
About MLPerf
MLPerf, generally regarded as the most reputable benchmark for AI performance, enables fair and repeatable performance comparisons. Intel anticipates submitting new AI training performance results for the next MLPerf benchmark.
The ongoing performance updates show Intel’s commitment to support customers and address every node of the AI continuum: from low-cost AI processors to the highest-performing AI hardware accelerators and GPUs for the network, cloud and enterprise customers.