Nvidia, Microsoft drive generative AI for Windows

Generative AI — in the form of large language model (LLM) applications like ChatGPT, image generators such as Stable Diffusion and Adobe Firefly, and game rendering techniques like Nvidia DLSS 3 Frame Generation — is ushering in a new era of computing for productivity, content creation, gaming and more.

At the Microsoft Build developer conference this week, Nvidia and Microsoft showcased a suite of advancements in Windows 11 PCs and workstations with Nvidia RTX GPUs to meet the demands of generative AI.

More than 400 Windows apps and games already employ AI technology, accelerated by dedicated processors on RTX GPUs called Tensor Cores. The new announcements, which include tools to develop AI on Windows PCs, frameworks to optimize and deploy AI, and driver performance and efficiency improvements, will empower developers to build a next generation of Windows apps with generative AI at their core.

“AI will be the single largest driver of innovation for Windows customers in the coming years,” says Pavan Davuluri, corporate vice-president of Windows silicon and system integration at Microsoft. “By working in concert with Nvidia on hardware and software optimisations, we’re equipping developers with a transformative, high-performance, easy-to-deploy experience.”

Develop models with Windows Subsystem for Linux

AI development has traditionally taken place on Linux, requiring developers to either dual-boot their systems or use multiple PCs to work in their AI development OS while still accessing the breadth and depth of the Windows ecosystem.

Over the past few years, Microsoft has been building a capability to run Linux directly within the Windows OS, called Windows Subsystem for Linux (WSL). Nvidia has been working closely with Microsoft to deliver GPU acceleration and support for the entire Nvidia AI software stack inside WSL. Now developers can use Windows PC for all their local AI development needs with support for GPU-accelerated deep learning frameworks on WSL.

With Nvidia RTX GPUs delivering up to 48Gb of RAM in desktop workstations, developers can now work with models on Windows that were previously only available on servers. The large memory also improves the performance and quality for local fine-tuning of AI models, enabling designers to customise them to their own style or content.

And, because the same Nvidia AI software stack runs on Nvidia data centre GPUs, it’s easy for developers to push their models to Microsoft Azure Cloud for large training runs.

Rapidly optimise and deploy models

With trained models in hand, developers need to optimise and deploy AI for target devices.

Microsoft released the Microsoft Olive toolchain for optimisation and conversion of PyTorch models to ONNX, enabling developers to automatically tap into GPU hardware acceleration such as RTX Tensor Cores. Developers can optimize models via Olive and ONNX, and deploy Tensor Core-accelerated models to PC or cloud. Microsoft continues to invest in making PyTorch and related tools and frameworks work seamlessly with WSL to provide the best AI model development experience.

Improved AI performance, power efficiency

Once deployed, generative AI models demand incredible inference performance. RTX Tensor Cores deliver up to 1 400 Tensor TFLOPS for AI inferencing. Over the last year, Nvidia has worked to improve DirectML performance to take full advantage of RTX hardware.

The company has release the latest optimisations in Release 532.03 drivers that combine with Olive-optimized models to deliver big boosts in AI performance. Using an Olive-optimised version of the Stable Diffusion text-to-image generator with the popular Automatic1111 distribution, performance is improved over two-times with the new driver.

With AI coming to nearly every Windows application, efficiently delivering inference performance is critical — especially for laptops. Coming soon, Nvidia will introduce new Max-Q low-power inferencing for AI-only workloads on RTX GPUs. It optimises Tensor Core performance while keeping power consumption of the GPU as low as possible, extending battery life and maintaining a cool, quiet system. The GPU can then dynamically scale up for maximum AI performance when the workload demands it.

AI-enabled software out now

Software developers like Adobe, DxO, ON1 and Topaz have already incorporated NVIDIA AI technology with more than 400 Windows applications and games optimised for RTX Tensor Cores.

Nvidia and Microsoft are making several resources available for developers to test drive generative AI models on Windows PCs. An Olive-optimised version of the Dolly 2.0 large language model is available on Hugging Face. And a PC-optimised version of NVIDIA NeMo large language model for conversational AI is coming soon to Hugging Face.