关于GPU利用与CNN推断的交集的研究

论文标题

关于GPU利用与CNN推断的交集的研究

A Study on the Intersection of GPU Utilization and CNN Inference

论文作者

Kosaian, Jack, Phanishayee, Amar

论文摘要

在开发具有高预测性能的神经网络体系结构方面取得了重大进展，并且还获得了高应用级别的推理吞吐量（例如，每秒帧）。重要性越来越重要的另一个指标是在推断过程中使用GPU的利用：测量部署的神经网络如何使用其运行的GPU的计算能力。实现高GPU利用率对于增加应用程序级别的吞吐量和确保部署GPU的投资回报良好至关重要。本文分析了卷积神经网络（CNN）推断的GPU利用率。我们首先调查了CNN的GPU利用率，以表明可以改善许多CNN的GPU利用率。然后，我们研究了神经体系结构搜索（NAS）搜索空间中网络的GPU利用率，并探索如何使用GPU利用率作为度量标准可能被可能用于加速NAS本身。我们的研究表明，有空间可以改善CNN的推理时间GPU利用率，并且对GPU利用率的了解也有可能使甚至不针对利用本身的应用程序受益。我们希望这项研究的结果将刺激未来设计GPU有效的神经网络的创新。

There has been significant progress in developing neural network architectures that both achieve high predictive performance and that also achieve high application-level inference throughput (e.g., frames per second). Another metric of increasing importance is GPU utilization during inference: the measurement of how well a deployed neural network uses the computational capabilities of the GPU on which it runs. Achieving high GPU utilization is critical to increasing application-level throughput and ensuring a good return on investment for deploying GPUs. This paper analyzes the GPU utilization of convolutional neural network (CNN) inference. We first survey the GPU utilization of CNNs to show that there is room to improve the GPU utilization of many of these CNNs. We then investigate the GPU utilization of networks within a neural architecture search (NAS) search space, and explore how using GPU utilization as a metric could potentially be used to accelerate NAS itself. Our study makes the case that there is room to improve the inference-time GPU utilization of CNNs and that knowledge of GPU utilization has the potential to benefit even applications that do not target utilization itself. We hope that the results of this study will spur future innovation in designing GPU-efficient neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题