AI data environments have different storage requirements

Enterprises are fast embracing more data-centric business models, and as a result, so is the need for big data and analytics workloads that use artificial intelligence (AI), machine learning (ML) and deep learning (DL).

By Daniel Thenga, NetApp and SolarWinds architecture manager, and business unit manager at Comstor Southern Africa

We know that good data equates to better business insights. Still, according to a recent whitepaper by IDC, “Storage Infrastructure Considerations for Artificial Intelligence Deep Learning Training Workloads in Enterprises,” outdated storage architectures can ultimately pose challenges in efficiently scaling large AI-driven workloads. IDC states, “Over 88% of enterprises purchase newer, more modernised storage infrastructure designs for those types of applications.”

Data pipelines are most effectively managed when the ingest, transformation, training, inferencing, production, and archiving stages are done in a consolidated storage processing framework in a single storage system. But the pressures of latency, high data concurrency, and multitenant management can put this system at risk. Why? Because these systems need to support cloud-native capabilities and integration and petabytes of data.

Because of this, organisations are taking the option of software-defined, scale-out storage infrastructures that support extensive hybrid cloud integration a lot more seriously.

AI is hungry for storage

But to understand the strains traditional storage is under, we need to know how AI plays a role in digital transformation and storage decisions. What we know about ML is that it needs to review data inputs, identify patterns and similarities in data, and then take what it has learnt and propose a decision. On the other hand, DL is more complex and uses multi-layered neural networks that can act on their learnings and make decisions without a human.

Both subsets of AI, ML and DL have different use cases. ML will alert you to problems, whereas DL is an actual learning system used in applications such as natural language processing, virtual assistants, and autonomous vehicle systems. The one thing they all share is their hunger for data. As IDC accurately determines, AI workloads generally perform better when they leverage larger data sets for training purposes.

Accordingly, IDC states that “most enterprises are experiencing data growth rates from 30% to 40% per year and will soon be managing multi-petabyte storage environments (if they are not already).” This led them to their subsequent finding that roughly 70% of organisations pursuing digital transformation will modernise their storage in the next two years to support performance, availability, scalability, and security.

Data evolution is driven by AI

What we are also learning is that AI applications are not static. The data models created by data scientists are constantly evolving; therefore, the training workflow applied to the data being used must continuously be refined and optimised, or the models will fail. This means that sometimes older data might not be archived if it does not support the model’s evolution.

Within this, the AI data pipeline needs different capabilities from the IT infrastructure in which it operates to process data-intensive workloads effectively. IDC showcases in its AI DL Training Infrastructure Survey that 62% of enterprises running AI workloads were running them on “high-density clusters.” The survey defined these as scale-out IT infrastructure leveraging some form of accelerated compute.

Bottom line? Big data, big ideas, and better insights all need better data and the right storage systems. This is again where software-defined storage is essential as it supports businesses running workloads in integrated hybrid cloud environments and those who need to move data between on- and off-premises locations. With a software-defined storage environment, a user benefits from the flexibility and data mobility needed by hybrid cloud environments.

AI environments need better storage

One thing is clear. IT people not immersed in the AI or data science process should not be making decisions on what storage and data systems the data scientists need. It must be a collective decision-making process carefully architected around the model development and AI use cases being deployed in the business.

An important factor influencing the infrastructure/system decision-making process is determining how much other workload consolidation a platform can support. This is critical in determining whether the platform has the performance and scalability to support workloads beyond AI. For example, a vendor like NetApp offers high-performance and highly scalable systems that help businesses to benefit from workload consolidation and infrastructure efficiencies.

Notably, when a single storage system (like NetApp) can meet the requirements of AI and enterprise storage, it immediately provides the business with a better ROI and avoids the legacy trap of siloed storage. If we look at NetApp ONTAP AI, it is a converged infrastructure stack that combines NVIDIA DGX accelerated compute, NVIDIA high-speed switching, ONTAP-based storage, and tools to manage AI workloads. As a single stack, it delivers an integrated system that is easy to procure and deploy, is supported by third-party vendors, and offers unified management.

To demonstrate the success of the all-inclusive nature of NetApp, IDC found that almost 60% of enterprises using this NetApp environment also run non-AI workloads on their storage.

The AI-infused future

We are still in the fledgling stages of the enterprise AI revolution, but we know that it will form a strategic and central part of any business that wants to transform digitally. And if we factor in that IDC says that “storage infrastructure spend in enterprises for AI workloads alone will be a $5.4 billion market by 2024,” we have only touched the tip of the iceberg.