论文标题
一个机器学习的光度法分类器,用于附近星系中的大型恒星I.该方法
A machine-learning photometric classifier for massive stars in nearby galaxies I. The method
论文作者
论文摘要
(删节)质量损失是巨大恒星进化的关键参数,理论和观察值之间存在差异,并且发作性质量损失的重要性不明。为了解决这个问题,我们需要大量的分类来源星星,这些星星涵盖一系列金属性环境。我们的目标是通过将机器学习技术应用于最近可用的广泛的光度目录来纠正情况。我们使用IR/Spitzer和光学/泛主角和Gaia Astrestric信息,在M31和M33中编译了已知大型恒星的大型目录,它们以蓝色,红色,红色,黄色,B [e]超级巨人,发光蓝色变量,wolf射线和背景星系分组。由于不平衡,我们实施了合成数据生成,以填充代表性不足的类别,并通过将多数类采样来改善分离。我们使用颜色指数构建了一个合奏分类器。将支持向量分类,随机森林和多层感知器的概率合并以进行最终分类。总体加权平衡精度为〜83%,以〜94%的速度恢复了红色超级巨头,蓝色/黄色/B [E]超级巨头和背景星系约为50-80%,狼射线的〜45%,以及〜30%的发光蓝色变量,主要是由于其小样本大小。光谱类型的混合(其颜色指数中没有严格的边界)使分类变得复杂。独立应用IC 1613,WLM和Sextans A星系的全部降低精度约为70%,归因于金属性和灭绝效应。使用平均值和迭代含量的简单替换探索了缺少的数据插补,事实证明,该数据归因于此。我们还发现R-I和Y- [3.6]是最重要的特征。我们的方法虽然受特征空间的采样限制,但有效地将源与缺少的数据和低金属材料分类。
(abridged) Mass loss is a key parameter in the evolution of massive stars, with discrepancies between theory and observations and with unknown importance of the episodic mass loss. To address this we need increased numbers of classified sources stars spanning a range of metallicity environments. We aim to remedy the situation by applying machine learning techniques to recently available extensive photometric catalogs. We used IR/Spitzer and optical/Pan-STARRS, with Gaia astrometric information, to compile a large catalog of known massive stars in M31 and M33, which were grouped in Blue, Red, Yellow, B[e] supergiants, Luminous Blue Variables, Wolf-Rayet, and background galaxies. Due to the high imbalance, we implemented synthetic data generation to populate the underrepresented classes and improve separation by undersampling the majority class. We built an ensemble classifier using color indices. The probabilities from Support Vector Classification, Random Forests, and Multi-layer Perceptron were combined for the final classification. The overall weighted balanced accuracy is ~83%, recovering Red supergiants at ~94%, Blue/Yellow/B[e] supergiants and background galaxies at ~50-80%, Wolf-Rayets at ~45%, and Luminous Blue Variables at ~30%, mainly due to their small sample sizes. The mixing of spectral types (no strict boundaries in their color indices) complicates the classification. Independent application to IC 1613, WLM, and Sextans A galaxies resulted in an overall lower accuracy of ~70%, attributed to metallicity and extinction effects. The missing data imputation was explored using simple replacement with mean values and an iterative imputor, which proved more capable. We also found that r-i and y-[3.6] were the most important features. Our method, although limited by the sampling of the feature space, is efficient in classifying sources with missing data and at lower metallicitites.