کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558360 874908 2013 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources
چکیده انگلیسی

This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman's account of auditory scene analysis, in which innate primitive grouping ‘rules’ are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into local time-frequency fragments of individual sound sources using signal-level primitive grouping cues. Second, statistical models are employed to select fragments belonging to the sound source of interest, and the hypothesis-driven stage simultaneously searches for the most probable speech/background segmentation and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. By integrating signal-level grouping cues with acoustic models of the target sound source in a probabilistic framework, the system is able to simultaneously separate and recognise the sound of interest from the mixture, and derive significant recognition performance benefits from different grouping cue estimates despite their inherent unreliability in noisy conditions. Finally, the paper will show that missing data imputation can be applied via fragment decoding to allow reconstruction of a clean spectrogram that can be further processed and used as input to conventional ASR systems. The best performing system achieves an average keyword recognition accuracy of 85.83% on the PASCAL CHiME Challenge task.


► A hearing-inspired approach for noise-robust distant-microphone speech recognition.
► Combines aspects of noise modelling and source separation approaches.
► Integration of spatial cues over spectro-temporal regions provides more reliable location estimates.
► The framework allows reconstruction of clean spectrogram from noisy signals.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 3, May 2013, Pages 820–836
نویسندگان
, , , ,