迈向深度学习应用程序的可扩展和分布式基础架构

论文标题

迈向深度学习应用程序的可扩展和分布式基础架构

Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

论文作者

Hasheminezhad, Bita, Shirzad, Shahrzad, Wu, Nanmiao, Diehl, Patrick, Schulz, Hannes, Kaiser, Hartmut

论文摘要

尽管最新扩展的训练深度神经网络的方法已被证明是有效的，但大型和复杂模型的计算强度以及大型数据集的可用性需要深度学习框架来利用扩展技术。在大多数可用的分布式深度学习框架的初步设计中，并未考虑并行化方法和分配要求，并且大多数仍然无法执行有效且有效的细粒度间交流。我们提出了有可能减轻这些缺点的潜力。 PhylAnx提供了面向生产力的前端，其中用户Python代码被转化为未经未来的执行树，可以使用C ++标准库在多个节点上有效地执行，并使用C ++标准库（HPX）（HPX），利用罚款粒度线程和基于主动消息的运行时间运行时系统。

Although recent scaling up approaches to training deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets, require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the preliminary designs of most available distributed deep learning frameworks, and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx offers a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题