Locat：SPARK SQL应用程序的低空在线配置自动调整

论文标题

Locat：SPARK SQL应用程序的低空在线配置自动调整

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

论文作者

Xin, Jinhan, Hwang, Kai, Yu, Zhibin

论文摘要

Spark SQL已被广泛部署在行业中，但调整其性能是具有挑战性的。最近的研究试图采用机器学习（ML）来解决这个问题，但遭受了两个缺点。首先，收集培训样本需要很长时间（高高的开销）。其次，同一应用程序的一个输入数据大小的最佳配置可能对其他应用程序可能不是最佳的。为了解决这些问题，我们提出了一种新颖的贝叶斯优化（BO）方法，名为LoCAT，以自动调整Spark SQL应用程序在线的配置。 LoCAT创新了三种技术。一种名为QCSA的技术在收集培训样本时通过查询配置灵敏度分析（QCSA）消除了对配置不敏感的查询。第二种技术称为dagp，是一个数据识别的高斯进程（DAGP），该过程将应用程序的性能建模为配置参数功能的分布以及输入数据大小。第三个称为IICP的技术标识了有关性能的重要配置参数（IICP），并且只会调整重要的配置参数。因此，LoCAT可以调整低开销的Spark SQL应用程序的配置，并适应不同的输入数据尺寸。我们使用基准套件TPC-DS，TPC-H和Hibench采用SPARK SQL应用程序，并在两个显着不同的群集上运行的Hibench，一个四节点的ARM群集和一个八节点X86群集，以评估Locat。手臂簇上的实验结果表明，LOCAT将最新方法的优化程序加速至少4.1倍，最高9.7倍。此外，LOCAT将应用程序性能提高至少1.9倍，最高2.4倍。在X86群集上，LoCAT显示出与手臂群集上的结果相似的结果。

Spark SQL has been widely deployed in industry but it is challenging to tune its performance. Recent studies try to employ machine learning (ML) to solve this problem, but suffer from two drawbacks. First, it takes a long time (high overhead) to collect training samples. Second, the optimal configuration for one input data size of the same application might not be optimal for others. To address these issues, we propose a novel Bayesian Optimization (BO) based approach named LOCAT to automatically tune the configurations of Spark SQL applications online. LOCAT innovates three techniques. The first technique, named QCSA, eliminates the configuration-insensitive queries by Query Configuration Sensitivity Analysis (QCSA) when collecting training samples. The second technique, dubbed DAGP, is a Datasize-Aware Gaussian Process (DAGP) which models the performance of an application as a distribution of functions of configuration parameters as well as input data size. The third technique, called IICP, Identifies Important Configuration Parameters (IICP) with respect to performance and only tunes the important ones. As such, LOCAT can tune the configurations of a Spark SQL application with low overhead and adapt to different input data sizes. We employ Spark SQL applications from benchmark suites TPC-DS, TPC-H, and HiBench running on two significantly different clusters, a four-node ARM cluster and an eight-node x86 cluster, to evaluate LOCAT. The experimental results on the ARM cluster show that LOCAT accelerates the optimization procedures of the state-of-the-art approaches by at least 4.1x and up to 9.7x; moreover, LOCAT improves the application performance by at least 1.9x and up to 2.4x. On the x86 cluster, LOCAT shows similar results to those on the ARM cluster.

下载PDF全文

下载文献需遵守相关版权规定

论文标题