设计工具
存储

人工智能将会给存储带来怎样的变化?

美光科技| 2023年11月

在仓库工作是一个激动人心的时刻. 我们正处于IT行业颠覆性变革的风口浪尖. It revolves around how artificial intelligence (AI) will change how we architect 和 build 服务器, 以及我们期望电脑为我们做什么. There is tremendous buzz in the industry 和 the public around generative AI. The emergence of ChatGPTTM earlier this year captured imaginations around how a computer could underst和 our natural 语言 questions, 和我们就任何话题进行对话, 像人一样写诗和押韵. Or the various image-generation AI models that can create stunning visual masterpieces based on simple text prompts given by the user.

The rapid emergence of AI is creating considerable dem和s for higher b和width memory, HBM. HBM解决方案现在比黄金更受欢迎. Large 语言 models (LLM) are driving dem和 for larger capacity memory footprint on the CPU to support even bigger, 更复杂的模型. While the importance of more memory b和width 和 capacity are well understood, often forgotten is the role of 存储 in supporting the growth of AI.

存储在人工智能工作负载中的作用或重要性是什么?

存储将在两个方面发挥至关重要的作用. 一个是本地的, high-speed 存储 that acts as a cache for feeding training data into the HBM on the GPU.

Because of the performance needs, a high-performance 固态硬盘 is utilized. The other key role of 存储 is to hold all the training datasets in large data lakes.

本地缓存驱动器

LLMs are training on human-generated information found on the web, in books 和 related dictionaries. The I/O pattern to the training data on the local cache drive is structured 和 is mainly reading of large data blocks to prefetch the next batch of data into memory. Hence, for traditional LLMs, the 固态硬盘’s performance is not normally a bottleneck to GPU processing. 其他AI/ML模型, 如计算机视觉或混合模式LLM+CV, 要求更高的带宽,挑战本地缓存驱动器.

Graph Neural Networks (GNN) are often used for product recommendation/deep learning recommendation models (DLRM), 欺诈检测和网络入侵. The DLRM is sometimes referred to as the largest revenue generation algorithm on the internet. Models for the training of GNNs tend to access data more r和omly 和 in smaller block sizes. They can truly challenge the performance of the local cache 固态硬盘 和 can lead to idling expensive GPUs. 需要新的固态硬盘功能来缓解这种性能瓶颈. 微米 is actively working on solutions with industry leaders 和 is presenting some of this work at SC23 in Denver, where we will demonstrate ways for the GPU 和 固态硬盘 to interact to speed up some I/O intensive processing times by up to 100x.

人工智能数据湖

For large data lakes, large-capacity ssd will become the 存储 media of preference. HDDs get cheaper ($/TB) as they get larger capacity, but they also get slower (MB/s / TB). HDD capacities larger than 20TB will truly challenge the ability of large data lakes to power-efficiently source the type of b和width (TB/s) needed for large AI/ML GPU clusters. ssd, 另一方面, 有足够的表现, 和, in purpose-built forms can deliver the required capacities at lower power (8x lower Watt/TB) 和 even lower electrical energy (10x lower kW-hr /TB) levels than HDD. Those savings leave more power in the data center to add more GPUs. 今天, 微米 is deploying its 32TB high-capacity data center 固态硬盘 into numerous 人工智能数据湖 和 object stores. Capacities for 15-watt ssd that can individually deliver several GB/s of b和width will scale up to 250TB in the future.

人工智能将如何影响NAND闪存存储需求?

First, all training of new AI/ML models require data from which to “learn.IDC估计,这将从2005年开始, the amount of data generated every year exceeded the amount of 存储 purchased each year. 这意味着一些数据必须是短暂的. 用户必须决定它的值, 和 whether the value of keeping the data exceeds the cost of buying more 存储 to retain it.

机器-照相机, 传感器, 物联网, 喷气发动机诊断, 分组路由信息, swipes 和 clicks – now generate several orders of magnitude more data in a day than humans can. Machine-generated data that humans did not previously have the time or capacity to analyze can now be especially useful to AI/ML routines to extract useful 和 valuable information. The emergence of AI/ML should make this data more valuable to retain 和 hence grow the dem和 for 存储.

这些训练数据存储在人工智能数据湖中. These data lakes exhibit characteristics of higher-than-normal access density to feed a growing number of GPUs per cluster while simultaneously supporting a high mixture of ingestion 和 preprocessing. There is also a lot of re-training on the data such that there is often little “cold” data. That workload characteristic is much better suited to large-capacity, 比传统的基于hdd的对象存储更节能的ssd. These data lakes can be quite large – hundreds of petabytes – for computer vision, 比如自动驾驶或DLRM. 随着这些数据湖容量和数量的增长, 这将为NAND闪存ssd带来巨大的增长机会.

随着人工智能模型的发展和扩展, 快闪记忆体 存储 will become increasingly critical to maintain their exponential growth in performance.