کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568940 876494 2007 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An automatic speech recognition system based on the scene analysis account of auditory perception
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
An automatic speech recognition system based on the scene analysis account of auditory perception
چکیده انگلیسی

Despite many years of concentrated research, the performance gap between automatic speech recognition (ASR) and human speech recognition (HSR) remains large. The difference between ASR and HSR is particularly evident when considering the response to additive noise. Whereas human performance is remarkably robust, ASR systems are brittle and only operate well within the narrow range of noise conditions for which they were designed. This paper considers how humans may achieve noise robustness. We take the view that robustness is achieved because the human perceptual system treats the problems of speech recognition and sound source separation as being tightly coupled. Taking inspiration from Bregman’s Auditory Scene Analysis account of auditory organisation, we present a speech recognition system which couples these processes by using a combination of primitive and schema-driven processes: first, a set of coherent spectro-temporal fragments is generated by primitive segmentation techniques; then, a decoder based on statistical ASR techniques performs a simultaneous search for the correct background/foreground segmentation and word sequence hypothesis. Mutually supporting solutions to both the source segmentation and speech recognition problems arise as a result. The decoder is tested on a challenging corpus of connected digit strings mixed monaurally at 0 dB and recognition performance is compared with that achieved by listeners using identical data. The results, although preliminary, are encouraging and suggest that techniques which interface ASA and statistical ASR have great potential. The paper concludes with a discussion of future research directions that may further develop this class of perceptually motivated ASR solutions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 49, Issue 5, May 2007, Pages 384–401
نویسندگان
, ,