论文标题
利用语言相关性来改善机器翻译:印度次大陆语言的案例研究
Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent
论文作者
论文摘要
在这项工作中,我们对涉及印度次大陆语言的统计机器翻译进行了广泛的研究。这些语言与遗传和接触关系有关。我们描述了这些关系引起的指示语言之间的相似性。我们探讨了这些语言之间如何利用这些语言之间的词汇和拼写相似性来提高指示语言之间的翻译质量,而有限的平行语料库可用。我们还探讨了如何利用指示语言之间的结构对应关系来重新利用英语的语言资源来进行语言翻译。我们的观察结果跨越了90个语言对的90对语言和英语。据我们所知,这是第一项专门用于利用语言相关性来改善相关语言翻译的大规模研究。
In this work, we present an extensive study of statistical machine translation involving languages of the Indian subcontinent. These languages are related by genetic and contact relationships. We describe the similarities between Indic languages arising from these relationships. We explore how lexical and orthographic similarity among these languages can be utilized to improve translation quality between Indic languages when limited parallel corpora is available. We also explore how the structural correspondence between Indic languages can be utilized to re-use linguistic resources for English to Indic language translation. Our observations span 90 language pairs from 9 Indic languages and English. To the best of our knowledge, this is the first large-scale study specifically devoted to utilizing language relatedness to improve translation between related languages.