用于语言模型的ASR生成的文本适用于语音任务

论文标题

用于语言模型的ASR生成的文本适用于语音任务

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

论文作者

Pelloin, Valentin, Dary, Franck, Herve, Nicolas, Favre, Benoit, Camelin, Nathalie, Laurent, Antoine, Besacier, Laurent

论文摘要

我们旨在使用大量自动转录语音来改进口语建模（LM）。我们利用INA（法国国家视听学院）的收藏，并在350,000小时的电视节目中应用ASR后获得19GB的文本。由此，通过微调现有的LM（FLAUBERT）或通过从头开始训练LM来培训口语模型。新模型（Flaubert-Oral）与社区共享，并评估了3个下游任务：口语理解，电视节目的分类和语音句法解析。结果表明，与最初的Flaubert版本相比，FLAUBERT-ORAL可能是有益的，表明尽管其固有的嘈杂性，但ASR生成的文本可用于构建口头语言模型。

We aim at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch. New models (FlauBERT-Oral) are shared with the community and evaluated for 3 downstream tasks: spoken language understanding, classification of TV shows and speech syntactic parsing. Results show that FlauBERT-Oral can be beneficial compared to its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-generated text can be used to build spoken language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题