使用转移学习开发哈萨克语的自动语音识别

论文标题

使用转移学习开发哈萨克语的自动语音识别

Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning

论文作者

N., Amirgaliyev E., N., Kuanyshbay D., O, Baimuratov

论文摘要

由于缺乏数据，哈萨克语自动语音识别系统的发展非常具有挑战性。哈萨克语的语音数据及其相应的抄录已被大量访问，并且不足以获得值得一提的结果。出于这个原因，对哈萨克语的语言识别尚未得到很好的探索。尚未探索这种区域的差异，但在传统的模型中，越来越多的模型却遭受了模型，他们的模型，越来越多的模型，越来越多的模型，越来越多的模型，越来越多的模型，越来越多的方法。数据。在我们的工作中，我们提出了一种新方法，该方法采用了俄罗斯语言的预培训模型，并将其知识作为神经网络结构的起点，这意味着我们将预培养的模型的权重转移到了神经网络中。我们选择俄罗斯模型的主要原因是，哈萨克语和俄罗斯语言的发音很相似，因为它们共享78％的言论，因为它是78％的言论，并且是ussian的corptas and Russus。我们已经在苏莱曼·德米雷尔大学（Suleyman Demirel University）的基础上收集了哈萨克（Hazakh）演讲的数据集，其中有50位母语者，每个人都有大约400个句子。我们在实验中考虑了4种不同的情况。首先，我们在不使用具有2个LSTM层和2个Bilstm的预训练的俄罗斯模型的情况下训练了我们的神经网络。我们已经培训了使用预训练的模型训练了相同的2个LSTM层和2个Bilstm。结果，我们通过使用外部俄罗斯语音识别模型分别提高了模型培训成本和标签错误率，分别高达24％和32％。经过培训的俄罗斯语言模型已培训了具有相同神经网络架构的100小时数据。

Development of Automatic Speech Recognition system for Kazakh language is very challenging due to a lack of data.Existing data of kazakh speech with its corresponding transcriptions are heavily accessed and not enough to gain a worth mentioning results.For this reason, speech recognition of Kazakh language has not been explored well.There are only few works that investigate this area with traditional methods Hidden Markov Model, Gaussian Mixture Model, but they are suffering from poor outcome and lack of enough data.In our work we suggest a new method that takes pre-trained model of Russian language and applies its knowledge as a starting point to our neural network structure, which means that we are transferring the weights of pre-trained model to our neural network.The main reason we chose Russian model is that pronunciation of kazakh and russian languages are quite similar because they share 78 percent letters and there are quite large corpus of russian speech dataset. We have collected a dataset of Kazakh speech with transcriptions in the base of Suleyman Demirel University with 50 native speakers each having around 400 sentences.Data have been chosen from famous Kazakh books. We have considered 4 different scenarios in our experiment. First, we trained our neural network without using a pre-trained Russian model with 2 LSTM layers and 2 BiLSTM .Second, we have trained the same 2 LSTM layered and 2 BiLSTM layered using a pre-trained model. As a result, we have improved our models training cost and Label Error Rate by using external Russian speech recognition model up to 24 percent and 32 percent respectively.Pre-trained Russian language model has trained on 100 hours of data with the same neural network architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题