IBM has revealed architecture details for the upcoming IBM Telum II Processor and IBM Spyre Accelerator – new technologies designed to significantly scale processing capacity across next generation IBM Z mainframe systems and help accelerate the use of traditional AI models and Large Language AI models in tandem through a new ensemble method of AI.
With many generative AI projects leveraging Large Language Models (LLMs) moving from proof-of-concept to production the demands for power-efficient, secured, and scalable solutions have emerged as key priorities.
Morgan Stanley research published in August projects generative AI’s power demands will skyrocket 75% annually over the next several years – putting it on track to consume as much energy in 2026 as Spain did in 2022. Many IBM clients indicate architectural decisions to support appropriately sized foundation models and hybrid-by-design approaches for AI workloads are increasingly important.
The key innovations unveiled by IBM include:
* IBM Telum II Processor: Designed to power next-generation IBM Z systems, the new IBM chip features increased frequency, memory capacity, a 40% growth in cache and integrated AI accelerator core, as well as a coherently attached Data Processing Unit (DPU) versus the first generation Telum chip. The new processor is expected to support enterprise compute solutions for LLMs, servicing the industry’s complex transaction needs.
* IO acceleration unit: A completely new Data Processing Unit (DPU) on the Telum II processor chip is engineered to accelerate complex IO protocols for networking and storage on the mainframe. The DPU simplifies system operations and can improve key component performance.
* IBM Spyre Accelerator: Provides additional AI compute capability to complement the Telum II processor. Working together, the Telum II and Spyre chips form a scalable architecture to support ensemble methods of AI modeling – the practice of combining multiple machine learning or deep learning AI models with encoder LLMs. By leveraging the strengths of each model architecture, ensemble AI may provide more accurate and robust results compared to individual models. The IBM Spyre Accelerator chip will be delivered as an add-on option. Each accelerator chip is attached via a 75-watt PCIe adapter and is based on technology developed in collaboration with IBM Research. As with other PCIe cards, the Spyre Accelerator is scalable to fit client needs.
“Our robust, multi-generation roadmap positions us to remain ahead of the curve on technology trends, including escalating demands of AI,” says Tina Tarquinio, vide-president, Product Management, IBM Z and LinuxONE.
“The Telum II Processor and Spyre Accelerator are designed to deliver high-performance, secured, and more power efficient enterprise computing solutions. After years in development, these innovations will be introduced in our next generation IBM Z platform so clients can leverage LLMs and generative AI at scale.”