دانلود رایگان مقاله: تشخیص گفتار تک کانال چند سخنرانی با آموزش غیر متناوب جایگزینی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
10151550	1666132	2018	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Single-channel multi-talker speech recognition with permutation invariant training

ترجمه فارسی عنوان

تشخیص گفتار تک کانال چند سخنرانی با آموزش غیر متناوب جایگزینی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

آموزش غیر مجاز تقاطع، تشخیص گفتار مخلوط چندگانه، جداسازی ویژگی، بهینه سازی مشترک،

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

تشخیص گفتار تک کانال چند سخنرانی با آموزش غیر متناوب جایگزینی

چکیده انگلیسی

Although great progress has been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech. In this paper, we propose and evaluate several architectures to address this problem under the assumption that only a single channel of mixed signal is available. Our technique extends permutation invariant training (PIT) by introducing the front-end feature separation module with the minimum mean square error (MSE) criterion and the back-end recognition module with the minimum cross entropy (CE) criterion. More specifically, during training we compute the average MSE or CE over the whole utterance for each possible utterance-level output-target assignment, pick the one with the minimum MSE or CE, and optimize for that assignment. This strategy elegantly solves the label permutation problem observed in the deep learning based multi-talker mixed speech separation and recognition systems. The proposed architectures are evaluated and compared on an artificially mixed AMI dataset with both two- and three-talker mixed speech. The experimental results indicate that against the state-of-the-art single-talker speech recognition system our proposed architectures can cut the word error rate (WER) by relative 45.0% and 25.0% across all speakers when their energies are comparable, for two- and three-talker mixed speech, respectively. To our knowledge, this is the first work on the single-channel multi-talker mixed speech recognition on the challenging speaker-independent spontaneous large vocabulary continuous speech task.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 104, November 2018, Pages 1-11

نویسندگان

Yanmin Qian, Xuankai Chang, Dong Yu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تشخیص گفتار تک کانال چند سخنرانی با آموزش غیر متناوب جایگزینی

دسترسی سریع

ارتباط

English Website