Infrastructural necessities for building and deploying enterprise-scale AI

Whether rendering sensitive datasets anonymous with synthetic data or simply facilitating ad-hoc question answering on a corpus of domain-specific knowledge, enterprise applications of AI are more sophisticated than ever before.

What hasn’t changed, however, is the infrastructural necessities upon which these applications are built. Without the proper infrastructure, enterprise AI deployments become too time-consuming, resource-intensive, and costly to sustain.

The contemporary stack that underlies scalable applications of advanced machine learning includes constructs for:

Compute: Although it depends on the particular use case, the compute practicalities that support the real-time nature of many AI applications—particularly those for end users—involve “a distributed workload across many GPUs,” acknowledged Jason Hardy, Hitachi Vantara CTO.
Storage: The capacity to store tremendous data quantities in a manner that’s easily—and quickly—accessible complements the computing required to support these applications.
Performance: Swift retrieval of data from storage to compute is optimized by an intermediary file system that “drives data to the GPUs as quick as possible to get those things working to support AI processing,” Hardy commented.

Organizations can avail themselves of contemporary offerings that furnish all three of these elements to power their cognitive computing use cases. Adoption supplies “the infrastructure to support the AI workload, both in terms of storage, performance, and density,” Hardy revealed.

Performance optimization

Most organizations realize there are a variety of options to cull from for their GPU and storage infrastructure. Optimized implementations for low latency use cases, however, require more than these resources. According to Hardy, “The problem with traditional storage infrastructure is it can’t go fast enough to support these highly parallelized workloads. GPUs can pull and pull information, traditionally from like a NAS or a distributed system, and the GPU becomes a bottleneck, where it can’t operate as fast and efficiently as possible.” Users can eschew these problems by accessing platforms with what Hardy termed a “high performance parallel file system” designed to maximize the performance of their compute and storage infrastructure.

Performance gains

The advantages of relying on this approach are manifold. Storage is no longer a ‘bottleneck’; Hardy estimated that GPUs “operate at 90-95 percent utilization, where traditionally you see 40 percent.” Moreover, the system can not only support random workloads, but also different types of workloads, such as predictions for customer-facing applications and data science realities for preparing models. This approach is more efficient than one requiring “dedicated resources for training or fine-tuning…and another set of resources for inferences, which is traditionally what you would do,” Hardy revealed. “Now, you use the same resources to do both because the storage underneath can support that.”

Flash storage

The low latency data pulling empowered by the parallel file system Hardy referenced is attributed to several factors, not the least of which involves a highly competitive form of flash storage. On the one hand, the file system incorporates generative 5 PCIe for rapid data transfer. On the other hand, it utilizes Generation 5 flash storage with E3S form factor. The tandem results in data retrieval “measured in nanoseconds,” Hardy remarked. “Because it’s not spinning like a traditional hard drive, we can do heavy throughput and IOPS at the same time. We can bring anything off the drive instantaneously.”

Object storage

Object storage is another vital facet of the storage infrastructure required for enterprise-grade AI deployments. This type of storage is less costly than flash storage; indeed; it’s exceedingly cheap in some environments. Object storage is well suited for the surplus of unstructured and semi-structured data organizations rely on for AI applications. Additionally, top end-to-end AI infrastructure platforms enable users to intelligently tier data between flash storage and object storage.

Thus, they can manage the tradeoff between cost-effective, long term data storage and more expensive, rapidly available data stored for consumer-facing AI use cases. Object storage is beneficial for snapshots, backing up data, and business continuity. “We’ve got immutability protection inside the object store, so it can’t be tampered with,” Hardy said. “That data can’t be manipulated and it’s maintained in a secure fashion, as well. If a ransomware attack hits your active data, you can recover from the object store.”

The right combination

The infrastructural requirements outlined above are just one of two critical requisites for enterprise AI deployments. Organizations seeking comprehensive options for training and deploying advanced machine learning models will certainly need GPUs, flash and object storage, and intermediary constructs—like the high performance parallel file system—to optimize their interplay.

However, Hardy cautioned that there’s another consideration for organizations to be able to discern and access the correct data that’s most meaningful for their applications. “You have to have the right data management strategy, so as new data comes into the environment, your models are either, through fine-tuning or RAG, ensuring you’re getting the latest version of the information when you’re asking it a question or it starts a process,” Hardy said.