学习评估英语超越英语的翻译：Bleurt提交的WMT指标2020共享任务

论文标题

学习评估英语超越英语的翻译：Bleurt提交的WMT指标2020共享任务

Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

论文作者

Sellam, Thibault, Pu, Amy, Chung, Hyung Won, Gehrmann, Sebastian, Tan, Qijun, Freitag, Markus, Das, Dipanjan, Parikh, Ankur P.

论文摘要

在过去的十年中，机器翻译系统的质量已大大提高，因此，评估已成为一个越来越具有挑战性的问题。本文介绍了我们对WMT 2020指标共享任务的贡献，这是自动评估翻译的主要基准。我们根据Bleurt进行了几次提交，Bleurt是基于转移学习的先前发布的指标。我们将标准范围扩展到英语之外，并通过14个语言对进行微调数据以及4个“零射”语言对进行评估，我们没有标记的示例。此外，我们将重点放在英语上，并演示如何将Bleurt的预测与Yisi的预测相结合，并使用替代参考翻译来增强性能。经验结果表明，这些模型在WMT Metrics 2019共享任务上获得了竞争成果，这表明他们对2020年版的承诺。

The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published metric based on transfer learning. We extend the metric beyond English and evaluate it on 14 language pairs for which fine-tuning data is available, as well as 4 "zero-shot" language pairs, for which we have no labelled examples. Additionally, we focus on English to German and demonstrate how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that the models achieve competitive results on the WMT Metrics 2019 Shared Task, indicating their promise for the 2020 edition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题