多语言首字母缩写提取的域自适应预处理

论文标题

多语言首字母缩写提取的域自适应预处理

Domain Adaptive Pretraining for Multilingual Acronym Extraction

论文作者

Yaseen, Usama, Langer, Stefan

论文摘要

本文介绍了我们参与多语言首字母缩写提取的共享任务SDU@AAAI-22的发现。该任务包括从科学和法律领域内6种语言中的文档提取的首字母缩写词。为了解决多语言的首字母缩写提取，我们使用了Bilstm-CRF使用多语言XLM-ROBERTA嵌入。我们在共享任务语料库上鉴定了XLM-Roberta模型，以进一步将XLM-Roberta嵌入到共享的任务域。我们的系统（团队：SMR-NLP）在所有语言中都实现了首字母缩写提取的竞争性能。

This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题