论文标题
我们是,我们认为我们是
WER we are and WER we think we are
论文作者
论文摘要
自然语言处理会话语音需要高质量的成绩单的可用性。在本文中,我们对基准数据集中现代自动语音识别(ASR)系统实现的最低单词错误率(WER)的最新报告表达了持怀疑态度。我们概述了流行的基准测试的几个问题,并在现实生活中自发的人类对话和HUB'05公共基准的内部数据集上比较了三个最先进的商业ASR系统。我们表明,WERS明显高于最佳报告的结果。我们制定了一组准则,这些准则可能有助于创建具有高质量注释的现实生活,多域数据集,用于培训和测试强大的ASR系统。
Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.