论文标题

自动调整GPU内核的基准测试优化算法

Benchmarking optimization algorithms for auto-tuning GPU kernels

论文作者

Schoonhoven, Richard, van Werkhoven, Ben, Batenburg, Kees Joost

论文摘要

近年来,由于其高成本相对较低的高平行计算功率,其应用程序的实现了惊人的增长以及图形处理单元(GPU)的能力。但是,编写计算高效的GPU程序(内核)具有挑战性,通常只有某些特定的内核配置会导致性能显着提高。自动调整是在目标硬件平台上自动优化软件以高效执行的过程。自动调整对于GPU编程特别有用,因为单个内核需要在代码更改,不同输入数据和不同体系结构后重新调查。但是,搜索空间的离散性和非凸性性质会带来一个具有挑战性的优化问题。在这项工作中,我们研究哪种算法会产生最快的内核,如果调整任务的时间预算有所不同。我们通过对16种不同的进化黑盒优化算法对26个不同的内核空间进行实验进行调查。然后,我们分析这些结果,并基于Pagerank Centrality概念引入了一种新型指标,以此来洞悉优化问题的难度。我们证明我们的度量与观察到的调谐性能密切相关。

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from 9 different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with observed tuning performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源