Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558358	874908	2013	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Automatic speech recognition - تشخیص گفتار خودکار Non-negative matrix factorization - تقسیم ماتریس غیر منفی Long short-term memory - حافظه طولانی مدت کوتاه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

چکیده انگلیسی

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from −6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.

► We present a framework for robust speech recognition in high levels of non-stationary background noise and reverberation.
► Our system applies NMF for speech enhancement as well as Long Short-Term Memory to exploit contextual information.
► We evaluate three different methods to integrate bidirectional LSTM modeling into speech decoding.
► All three system variants achieve remarkable performance on the CHiME Challenge task.
► Our system outperforms the best technique proposed in the context of the PASCAL CHiME Challenge 2011.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 3, May 2013, Pages 780–797

نویسندگان

Martin Wöllmer, Felix Weninger, Jürgen Geiger, Björn Schuller, Gerhard Rigoll,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

دسترسی سریع

ارتباط

English Website