Codebench：神经架构和硬件加速器共同设计框架

论文标题

Codebench：神经架构和硬件加速器共同设计框架

CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework

论文作者

Tuli, Shikhar, Li, Chia-Hao, Sharma, Ritvik, Jha, Niraj K.

论文摘要

最近，机器学习（ML）模型和加速器体系结构的自动设计引起了行业和学术界的极大关注。但是，大多数共同设计的框架要么探索有限的搜索空间，要么采用次优探索技术来同时设计ML模型和加速器的设计决策研究。此外，训练ML模型并模拟加速器性能在计算上很昂贵。为了解决这些局限性，这项工作提出了一种新型的神经体系结构和硬件加速器共同设计框架，称为CodeBench。它由两个新的基准子框架，即CNNbench和Accelbench组成，它们探索了扩展的卷积神经网络（CNN）和CNN加速器的设计空间。 CNNBENCH利用高级搜索技术Boshnas，有效地训练神经异质的替代模型，通过采用二阶梯度来收敛到最佳的CNN体系结构。 Accelbench对庞大的设计空间中的各种加速器体系结构进行了周期精确的模拟。与最佳的CNN-Accelerator对相比，我们最好的CNN-Accelerator对借助称为BoshCode的共同设计方法的准确性比最先进的对达到了1.4％，同时使潜伏期降低59.1％，能量消耗率降低了60.8％。在Imagenet数据集上，它的TOP1准确性提高了3.7％，低潜伏期下降43.8％，能源消耗降低11.2％。 CodeBench的表现优于最先进的框架，即自动NBA，通过达到1.5％的精度和34.7倍的吞吐量，同时启用了CIFAR-10上的11.0倍较低的能量 - 赛产品（EDP）和降低4.0倍的芯片面积。

Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It is composed of two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for a diverse set of accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair, while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8% lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7x higher throughput, while enabling 11.0x lower energy-delay product (EDP) and 4.0x lower chip area on CIFAR-10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题