鲁棒混合CTC/注意语音识别的音频对抗示例

论文标题

鲁棒混合CTC/注意语音识别的音频对抗示例

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

论文作者

Kürzinger, Ludwig, Rosas, Edgar Ricardo Chavez, Li, Lujun, Watzel, Tobias, Rigoll, Gerhard

论文摘要

自动语音识别（ASR）的最新进展证明了端到端系统如何能够实现最新性能。具有更深层次的神经网络的趋势，但是这些ASR模型也更加复杂，并且容易反对专门精心设计的嘈杂数据。以前在使用连接式时间分类（CTC）以及基于注意的编码器架构体系结构的ASR系统上证明了这些音频对抗示例（AAE）。遵循混合CTC/注意力ASR系统的想法，这项工作提出了算法来生成AAE，以将这两种方法结合到联合CTC注意梯度方法中。评估是使用两个参考句子作为案例研究以及Tedlium V2语音识别任务的混合CTC/端到端ASR模型进行评估。然后，我们演示了该算法在对抗训练中的应用，以获得更强大的ASR模型。

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题