论文标题
Difftune:使用学习的可区分替代物优化CPU模拟器参数
DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates
论文作者
论文摘要
CPU模拟器是建模CPU执行行为的有用工具。但是,由于设置精细颗粒参数的成本和复杂性,例如单个指令的潜伏期,它们的损失是不准确的。这种复杂性源于设计基准和测量框架所需的专业知识,这些框架可以精确地测量这种细粒度的参数值。在某些情况下,这些参数不一定具有物理实现,因此从根本上近似甚至是无法衡量的。 在本文中,我们介绍了Difftune,该系统是从粗粒端到端测量中学习X86基本块CPU模拟器参数的系统。给定模拟器,Difftune首先用可区分的替代替换原始模拟器来学习其参数,这是近似原始函数的另一个函数。通过使替代物可区分,即使原始函数不可差异化,difftune也能够应用基于梯度的优化技术,例如CPU模拟器是这种情况。使用这种可区分的替代物,DiFftune随后应用基于梯度的优化来产生模拟器参数的值,以最大程度地减少地面真相端到端性能测量数据集上的模拟器错误。最后,将学习的参数插入原始模拟器中。 Difftune能够自动学习LLVM-MCA的Intel X86仿真模型中的整个微结构特定参数,LLVM-MCA是基于LLVM的指令计划模型的基本块CPU模拟器。 Difftune的学习参数导致LLVM-MCA达到平均误差,不仅匹配,而且降低了其原始专家提供的参数值的误差。
CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer from inaccuracies due to the cost and complexity of setting their fine-grained parameters, such as the latencies of individual instructions. This complexity arises from the expertise required to design benchmarks and measurement frameworks that can precisely measure the values of parameters at such fine granularity. In some cases, these parameters do not necessarily have a physical realization and are therefore fundamentally approximate, or even unmeasurable. In this paper we present DiffTune, a system for learning the parameters of x86 basic block CPU simulators from coarse-grained end-to-end measurements. Given a simulator, DiffTune learns its parameters by first replacing the original simulator with a differentiable surrogate, another function that approximates the original function; by making the surrogate differentiable, DiffTune is then able to apply gradient-based optimization techniques even when the original function is non-differentiable, such as is the case with CPU simulators. With this differentiable surrogate, DiffTune then applies gradient-based optimization to produce values of the simulator's parameters that minimize the simulator's error on a dataset of ground truth end-to-end performance measurements. Finally, the learned parameters are plugged back into the original simulator. DiffTune is able to automatically learn the entire set of microarchitecture-specific parameters within the Intel x86 simulation model of llvm-mca, a basic block CPU simulator based on LLVM's instruction scheduling model. DiffTune's learned parameters lead llvm-mca to an average error that not only matches but lowers that of its original, expert-provided parameter values.