我们是，我们认为我们是

论文标题

我们是，我们认为我们是

WER we are and WER we think we are

论文作者

Szymański, Piotr, Żelasko, Piotr, Morzy, Mikolaj, Szymczak, Adrian, Żyła-Hoppe, Marzena, Banaszczak, Joanna, Augustyniak, Lukasz, Mizgajski, Jan, Carmiel, Yishay

论文摘要

自然语言处理会话语音需要高质量的成绩单的可用性。在本文中，我们对基准数据集中现代自动语音识别（ASR）系统实现的最低单词错误率（WER）的最新报告表达了持怀疑态度。我们概述了流行的基准测试的几个问题，并在现实生活中自发的人类对话和HUB'05公共基准的内部数据集上比较了三个最先进的商业ASR系统。我们表明，WERS明显高于最佳报告的结果。我们制定了一组准则，这些准则可能有助于创建具有高质量注释的现实生活，多域数据集，用于培训和测试强大的ASR系统。

Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题