دانلود رایگان مقاله: مقایسه درک صحیح گفتار انسان و اتوماتیک در صحنه های ساده و پیچیده آکوستیک

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6951454	1451675	2018	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Comparing human and automatic speech recognition in simple and complex acoustic scenes

ترجمه فارسی عنوان

مقایسه درک صحیح گفتار انسان و اتوماتیک در صحنه های ساده و پیچیده آکوستیک

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

مقایسه ماشین انسانها، آستانه تشخیص گفتار، شبکه های عمیق عصبی، پیش بینی قابلیت تفسیر گفتار، صحنه های فضایی،

Deep neural networks - شبکه های عصبی عمیق

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

مقایسه درک صحیح گفتار انسان و اتوماتیک در صحنه های ساده و پیچیده آکوستیک

چکیده انگلیسی

Former comparisons of human speech recognition (HSR) and automatic speech recognition (ASR) have shown that humans outperform ASR systems in nearly all speech recognition tasks. However, recent progress in ASR has led to substantial improvements of recognition accuracy, and it is therefore unclear how large the task-dependent human-machine gap still remains. This paper investigates this gap between HSR and ASR based on deep neural networks (DNNs) in different acoustic conditions, with the aim of comparing differences and identifying processing strategies that should be considered in ASR. We find that DNN-based ASR reaches human performance for single-channel, small-vocabulary tasks in the presence of speech-shaped noise and in multi-talker babble noise, which is an important difference to previous human-machine comparisons: The speech reception threshold, i.e., the signal-to-noise ratio with 50% word recognition rate is at about â7 to â8Â dB both for HSR and ASR. However, in more complex spatial scenes with diffuse noise and moving talkers, the SRT gap amounts to approximately 12Â dB. Based on cross comparisons that use oracle knowledge (e.g., the speakers' true position), incorrect responses are attributed to localization errors or missing pitch information to distinguish between speakers with different gender. In terms of the SRT, localization errors and missing spectral information amount to 2.1 and 3.2Â dB, respectively. The comparison hence identifies specific components in ASR that can profit from learning from auditory signal processing.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 52, November 2018, Pages 123-140

نویسندگان

Constantin Spille, Birger Kollmeier, Bernd T. Meyer,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : مقایسه درک صحیح گفتار انسان و اتوماتیک در صحنه های ساده و پیچیده آکوستیک

دسترسی سریع

ارتباط

English Website