Intel achieves full NPU support in MLPerf Client v0.6 benchmark

Intel has announced that it is the only company to achieve full neural processing unit (NPU) support in the newly released MLPerf Client v0.6 benchmark.

The result marks the industry’s first standardised evaluation of large language model (LLM) performance on client NPUs. Intel’s measurements of MLPerf Client v0.6 show Intel Core Ultra Series 2 processors can produce output on both the graphics processing unit (GPU) and the NPU much faster than a typical human can read.

“We are proud to lead the industry in enabling full NPU acceleration and industry-leading GPU performance for AI workloads on client PC platforms. This success reflects Intel’s deep hardware-software co-optimization and commitment to democratising AI for PCs everywhere,” says Daniel Rogers, Intel vice-president and GM of PC product marketing.

MLPerf Client v0.6 measures four content generation and summarization use cases based on the Llama 2 7B model. Intel demonstrated leading performance across NPU and built-in Intel Arc GPU.

Intel achieved the fastest NPU response time, generating the first word in just 1.09 seconds (first token latency), meaning it begins answering almost immediately after receiving a prompt. It also delivered the highest NPU throughput at 18,55 tokens per second, referring to how quickly the system can generate each additional piece of text, enabling seamless real-time AI interaction.

Additionally, compared to competitors, Intel showed GPU leadership in time to first token, starting faster than the competition and reinforcing its NPU and GPU end-to-end AI acceleration advantage.