通过跨语言自动语音识别发现语音清单

论文标题

通过跨语言自动语音识别发现语音清单

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

论文作者

Żelasko, Piotr, Feng, Siyuan, Velazquez, Laureano Moro, Abavisani, Ali, Bhati, Saurabhchand, Scharenborg, Odette, Hasegawa-Johnson, Mark, Dehak, Najim

论文摘要

数据获取的高成本使大多数现有语言的自动语音识别（ASR）模型培训有问题，包括甚至没有书面脚本的语言，或者手机清单仍然未知。过去的作品探讨了多语言培训，转移学习以及零射击学习，以便为这些低资源语言构建ASR系统。虽然已经证明，从多种语言中汇集资源是有帮助的，但我们尚未看到ASR模型在培训期间未见语言的成功应用。 ASR从可见的语言转化为看不见的语言的关键步骤是创建了看不见语言的电话清单。我们工作的最终目标是在培训期间以无监督的方式构建看不见的语言的手机清单，而对语言有任何了解。在本文中，我们1）研究不同因素（即模型体系结构，音调模型，语音表示类型）对未知语言识别电话识别的影响； 2）分析哪些手机跨语言良好转移，哪些手机无法理解自动创建手机库存的进一步改进的局限性和区域； 3）提出不同的方法，以一种无监督的方式构建一种看不见语言的手机清单。为此，我们对一组13种语音多样的语言和几种深入分析进行了单，多和跨语言实验。我们发现了许多通用的电话令牌（IPA符号），它们在跨语言上得到了很好的认可。通过对结果的详细分析，我们得出结论，独特的声音，相似的声音和语调语言仍然是语音库存发现的主要挑战。

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we 1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题