论文标题
基于概念验证性能便携式SYCL基于SYCL的快速傅立叶变换库
Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library
论文作者
论文摘要
在本文中,我们提出了一个基于SYCL的FFT库的早期版本,该库能够在所有主要供应商硬件上运行,包括来自AMD,ARM,Intel和Nvidia的CPU和GPU。尽管初步,但这项工作的目的是为了计算FFT的丰富功能播种进一步的发展。它比现有的Portable FFT库具有优势,因为它是单源,因此消除了由于充分利用预处理宏和自动生成的内核来靶向不同架构而产生的复杂性。我们练习两个启用SYCL的编译器CodePlay ComputeCpp和Intel的开源LLVM项目,以评估基于SYCL的FFT的各种异质体系结构的性能可移植性。我们库的当前局限性是它支持长度和基本-2输入序列的单维ffts最高$ 2^{11} $。我们将结果与高度优化的供应商特定的FFT库进行了比较,并提供了详细的分析,以证明性能水平以及绩效瓶颈的潜在来源。
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all major vendor hardware, including CPUs and GPUs from AMD, ARM, Intel and NVIDIA. Although preliminary, the aim of this work is to seed further developments for a rich set of features for calculating FFTs. It has the advantage over existing portable FFT libraries in that it is single-source, and therefore removes the complexities that arise due to abundant use of pre-process macros and auto-generated kernels to target different architectures. We exercise two SYCL-enabled compilers, Codeplay ComputeCpp and Intel's open-source LLVM project, to evaluate performance portability of our SYCL-based FFT on various heterogeneous architectures. The current limitations of our library is it supports single-dimension FFTs up to $2^{11}$ in length and base-2 input sequences. We compare our results with highly optimized vendor specific FFT libraries and provide a detailed analysis to demonstrate a fair level of performance, as well as potential sources of performance bottlenecks.