论文标题
敏捷:延迟敏感服务器应用程序的节能CPU核心空闲体系结构
AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications
论文作者
论文摘要
在现代数据中心中运行的用户面向应用程序表现出不规则的请求模式,并使用多种延迟要求的多种服务来实施。这些特征由于从深度闲置的稳定力状态(C-State)的长期过渡时间而闲置时,现有的能源保护技术无效。尽管先前的工作提出了减轻这种低效率的管理技术,但我们将其与敏捷瓦特(AW)(AW)的根源:一种针对针对潜伏敏感应用程序的数据中心服务器处理器优化的新的Deep C-State Architecture。 AW是基于三个关键想法。首先,AW消除了保存/恢复核心上下文(即微构造状态)的延迟开销(即,在深度闲置功率状态下进行动力/对核心的核心时),i)通过i)实施中等粒度的功率,在跨CPU核心上进行了精心分布,并在电动枪口中保留上下文。其次,AW通过保持L1/L2 CACHE含量功率吞并的L1/L2缓存时消除了L1/L2缓存的齐平潜伏期开销(几十微秒)。最小的控制逻辑还保持功能为止,以无缝地服务缓存相干流量(即Snoops)。 AW在缓存中实现睡眠模式,以减少漏水功耗,并将核心电压降低到最小操作电压水平,以最大程度地减少功率核电核管域的泄漏功率。第三,使用最先进的电力效率全数字循环环(ADPLL)时钟发电机,AW在空闲状态下保持PLL的活跃并锁定,进一步以微不足道的功率成本削减了宝贵的唤醒潜伏期。我们对Intel Skylake服务器进行校准的精确模拟器的评估表明,AW可将MEMCACH的能源消耗降低高达71%(平均35%),最多1%的性能退化。
User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements. These characteristics render ineffective existing energy conserving techniques when processors are idle due to the long transition time from a deep idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep C-state architecture optimized for datacenter server processors targeting latency-sensitive applications. AW is based on three key ideas. First, AW eliminates the latency overhead of saving/restoring the core context (i.e., micro-architectural state) when powering-off/-on the core in a deep idle power state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) retaining context in the power-ungated domain. Second, AW eliminates the flush latency overhead (several tens of microseconds) of the L1/L2 caches when entering a deep idle power state by keeping L1/L2 cache content power-ungated. A minimal control logic also remains power-ungated to serve cache coherence traffic (i.e., snoops) seamlessly. AW implements sleep-mode in caches to reduce caches leakage power consumption and lowers a core voltage to the minimum operational voltage level to minimize the leakage power of the power-ungated domain. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, further cutting precious microseconds of wake-up latency at a negligible power cost. Our evaluation with an accurate simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumption of Memcached by up to 71% (35% on average) with up to 1% performance degradation.