论文标题

从增强决策树中提取更多:高能量物理案例研究

Extracting more from boosted decision trees: A high energy physics case study

论文作者

Lalchand, Vidhi

论文摘要

粒子识别是大型强子对撞机(LHC)的数据分析管道中的核心任务之一。从统计上讲,这需要识别埋在巨大背景中的罕见信号事件,这些事件模仿了前者的性质。在机器学习的话语中,粒子识别代表了一个分类问题,其特征是重叠和不平衡的类。增强的决策树(BDT)在粒子识别域取得了巨大的成功,但最近被深度学习(DNNS)方法掩盖了。这项工作提出了一种算法,通过瞄准其主要弱点,对过度拟合的敏感性来提取更多标准的决策树。这项新颖的结构可以同时利用增强和装袋的元学习技术,并在Atlas Higgs(H)上表现出色,与Tau-Tau数据集(Atlas等,2014),这是2014年Higgs ML挑战的主题(Adam-Bourdarios挑战(Adam-Bourdarios et al。,2015年,2015年)。尽管基于2016年数据花期,2018年建立了希格斯对一对tau lepton的衰减(CMS Collaboration等,2017)的衰减为4.9 $σ$的重要性,但2014年的公共数据集继续用作基准数据集,用于测试被监督分类方案的性能。我们表明,提议的算法达到的分数非常接近发表的获胜分数,该分数利用了深度神经网络(DNNS)的合奏。尽管本文重点介绍了一个应用程序,但预计这种简单而强大的技术将在高能量物理学中找到更广泛的应用。

Particle identification is one of the core tasks in the data analysis pipeline at the Large Hadron Collider (LHC). Statistically, this entails the identification of rare signal events buried in immense backgrounds that mimic the properties of the former. In machine learning parlance, particle identification represents a classification problem characterized by overlapping and imbalanced classes. Boosted decision trees (BDTs) have had tremendous success in the particle identification domain but more recently have been overshadowed by deep learning (DNNs) approaches. This work proposes an algorithm to extract more out of standard boosted decision trees by targeting their main weakness, susceptibility to overfitting. This novel construction harnesses the meta-learning techniques of boosting and bagging simultaneously and performs remarkably well on the ATLAS Higgs (H) to tau-tau data set (ATLAS et al., 2014) which was the subject of the 2014 Higgs ML Challenge (Adam-Bourdarios et al., 2015). While the decay of Higgs to a pair of tau leptons was established in 2018 (CMS collaboration et al., 2017) at the 4.9$σ$ significance based on the 2016 data taking period, the 2014 public data set continues to serve as a benchmark data set to test the performance of supervised classification schemes. We show that the score achieved by the proposed algorithm is very close to the published winning score which leverages an ensemble of deep neural networks (DNNs). Although this paper focuses on a single application, it is expected that this simple and robust technique will find wider applications in high energy physics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源