论文标题
在表格域中半监督学习中的渐进式特征升级
Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain
论文作者
论文摘要
最近的半监督和自我监督方法通过使用增强技术在图像和文本领域中取得了巨大成功。尽管取得了如此成功,但将此成功转移到表格域并不容易。由于在表格域中混合了不同数据类型(连续数据和分类数据),因此将特定于域特定的转换从图像和语言转化为表格数据并不容易。在表格域上有一些半监督的作品,他们着重于为表格数据提出新的增强技术。这些方法可能显示出对分类数据中低心电图的数据集的一些改进。但是,尚未应对基本挑战。所提出的方法要么不适用于具有高心态的数据集,要么不使用对分类数据的有效编码。我们建议使用条件概率表示,并有效地逐渐具有升级框架,以有效地学习半监督应用程序中的表格数据的表示形式。广泛的实验表明,所提出的框架的卓越性能以及在半监督环境中的潜在应用。
Recent semi-supervised and self-supervised methods have shown great success in the image and text domain by utilizing augmentation techniques. Despite such success, it is not easy to transfer this success to tabular domains. It is not easy to adapt domain-specific transformations from image and language to tabular data due to mixing of different data types (continuous data and categorical data) in the tabular domain. There are a few semi-supervised works on the tabular domain that have focused on proposing new augmentation techniques for tabular data. These approaches may have shown some improvement on datasets with low-cardinality in categorical data. However, the fundamental challenges have not been tackled. The proposed methods either do not apply to datasets with high-cardinality or do not use an efficient encoding of categorical data. We propose using conditional probability representation and an efficient progressively feature upgrading framework to effectively learn representations for tabular data in semi-supervised applications. The extensive experiments show superior performance of the proposed framework and the potential application in semi-supervised settings.