DPDR：一种新型的机器学习方法，用于降低维度的决策过程

论文标题

DPDR：一种新型的机器学习方法，用于降低维度的决策过程

DPDR: A novel machine learning method for the Decision Process for Dimensionality Reduction

论文作者

Dessureault, Jean-Sébastien, Massicotte, Daniel

论文摘要

本文讨论了在监督学习环境中提取或选择功能的关键决策过程。找到一种合适的方法来降低维度，这通常是令人困惑的。根据数据的性质和用户的喜好在功能选择和功能提取之间有利弊。实际上，用户可能希望强调对完整性或可解释性以及特定数据解决方案的结果。本文提出了一种新方法，以在监督的学习环境中选择最佳的减少维度降低方法。它还有助于删除或重建功能，直到达到目标分辨率为止。该目标分辨率可以用户定义，也可以由该方法自动定义。该方法应用回归或分类，评估结果，并在此特定的监督学习环境中诊断出最佳维度降低过程。所使用的主要算法是随机森林算法（RF），主成分分析（PCA）算法和多层感知器（MLP）神经网络算法。提出了六种用例，每个用例都基于一些众所周知的技术来生成合成数据。这项研究讨论了可以在此过程中做出的每个选择，旨在阐明选择或提取功能的整个决策过程的问题。

This paper discusses the critical decision process of extracting or selecting the features in a supervised learning context. It is often confusing to find a suitable method to reduce dimensionality. There are pros and cons to deciding between a feature selection and feature extraction according to the data's nature and the user's preferences. Indeed, the user may want to emphasize the results toward integrity or interpretability and a specific data resolution. This paper proposes a new method to choose the best dimensionality reduction method in a supervised learning context. It also helps to drop or reconstruct the features until a target resolution is reached. This target resolution can be user-defined, or it can be automatically defined by the method. The method applies a regression or a classification, evaluates the results, and gives a diagnosis about the best dimensionality reduction process in this specific supervised learning context. The main algorithms used are the Random Forest algorithms (RF), the Principal Component Analysis (PCA) algorithm, and the multilayer perceptron (MLP) neural network algorithm. Six use cases are presented, and every one is based on some well-known technique to generate synthetic data. This research discusses each choice that can be made in the process, aiming to clarify the issues about the entire decision process of selecting or extracting the features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题