Twitter数据上的功能工程与BERT

论文标题

Twitter数据上的功能工程与BERT

Feature Engineering vs BERT on Twitter Data

论文作者

Gani, Ryiaadh, Chalaguine, Lisa

论文摘要

在本文中，我们使用功能工程和词向量以及使用三个数据集上的单词嵌入式的最先进的语言模型BERT比较了传统机器学习模型的性能。与BERT相比，我们还考虑功能工程的时间和成本效率。从我们的结果来看，我们得出的结论是，对于我们用于比较的三个数据集之一，BERT模型的使用仅值得花费时间和成本权衡，在该数据集中，BERT模型的表现明显优于使用特征向量的任何类型的传统分类器，而不是嵌入。使用BERT模型为其他数据集使用仅达到0.03和0.05的准确性和F1分数，这可以说，这可能会使其使用不值得GPU的时间和成本。

In this paper, we compare the performances of traditional machine learning models using feature engineering and word vectors and the state-of-the-art language model BERT using word embeddings on three datasets. We also consider the time and cost efficiency of feature engineering compared to BERT. From our results we conclude that the use of the BERT model was only worth the time and cost trade-off for one of the three datasets we used for comparison, where the BERT model significantly outperformed any kind of traditional classifier that uses feature vectors, instead of embeddings. Using the BERT model for the other datasets only achieved an increase of 0.03 and 0.05 of accuracy and F1 score respectively, which could be argued makes its use not worth the time and cost of GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题