自动调整GPU内核的基准测试优化算法

论文标题

自动调整GPU内核的基准测试优化算法

Benchmarking optimization algorithms for auto-tuning GPU kernels

论文作者

Schoonhoven, Richard, van Werkhoven, Ben, Batenburg, Kees Joost

论文摘要

近年来，由于其高成本相对较低的高平行计算功率，其应用程序的实现了惊人的增长以及图形处理单元（GPU）的能力。但是，编写计算高效的GPU程序（内核）具有挑战性，通常只有某些特定的内核配置会导致性能显着提高。自动调整是在目标硬件平台上自动优化软件以高效执行的过程。自动调整对于GPU编程特别有用，因为单个内核需要在代码更改，不同输入数据和不同体系结构后重新调查。但是，搜索空间的离散性和非凸性性质会带来一个具有挑战性的优化问题。在这项工作中，我们研究哪种算法会产生最快的内核，如果调整任务的时间预算有所不同。我们通过对16种不同的进化黑盒优化算法对26个不同的内核空间进行实验进行调查。然后，我们分析这些结果，并基于Pagerank Centrality概念引入了一种新型指标，以此来洞悉优化问题的难度。我们证明我们的度量与观察到的调谐性能密切相关。

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from 9 different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with observed tuning performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题