论文标题

利用文档级神经机器翻译的话语奖励

Leveraging Discourse Rewards for Document-Level Neural Machine Translation

论文作者

Unanue, Inigo Jauregi, Esmaili, Nazanin, Haffari, Gholamreza, Piccardi, Massimo

论文摘要

文档级的机器翻译重点关注整个文档从源到目标语言的翻译。由于文档中单个句子的翻译需要保留文档级别的话语各个方面,因此被广泛认为是一项具有挑战性的任务。但是,文档级翻译模型通常不经过培训以明确确保话语质量。因此,在本文中,我们提出了一种培训方法,该方法通过使用强化学习目标明确优化了两个既定的话语指标,词汇凝聚力(LC)和连贯性(COH)。在四个不同的语言对和三个翻译域上进行的实验表明,与其他竞争方法相比,我们的培训方法能够获得更具凝聚力和连贯的文档翻译,但并没有损害参考翻译的忠诚。就ZH-EN语言对而言,我们的方法在LC中获得了2.46个百分点(PP)的提高,而COH的COH优于亚军,同时提高了BLEU分数为0.63 pp,而F_Bert则提高了0.47 pp。

Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion (LC) and coherence (COH), by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F_BERT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源