کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558358 874908 2013 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
چکیده انگلیسی

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from −6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.


► We present a framework for robust speech recognition in high levels of non-stationary background noise and reverberation.
► Our system applies NMF for speech enhancement as well as Long Short-Term Memory to exploit contextual information.
► We evaluate three different methods to integrate bidirectional LSTM modeling into speech decoding.
► All three system variants achieve remarkable performance on the CHiME Challenge task.
► Our system outperforms the best technique proposed in the context of the PASCAL CHiME Challenge 2011.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 3, May 2013, Pages 780–797
نویسندگان
, , , , ,