دانلود رایگان مقاله: یک معماری عمیق برای تشخیص فعالیت صوتی و تصویری در حضور گذرا

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4977362	1451925	2018	6 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A deep architecture for audio-visual voice activity detection in the presence of transients

ترجمه فارسی عنوان

یک معماری عمیق برای تشخیص فعالیت صوتی و تصویری در حضور گذرا

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

پردازش گفتار صوتی و تصویری، تشخیص فعالیت صوتی، خودکار رمزگذار، شبکه عصبی مکرر،

Voice activity detection - تشخیص فعالیت صوتی Auto-encoder - خودکار رمزگذار Recurrent neural networks - شبکه های عصبی راجعه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

یک معماری عمیق برای تشخیص فعالیت صوتی و تصویری در حضور گذرا

چکیده انگلیسی

- A Deep architecture for audio-visual voice activity detection is proposed.
- Specifically designed auto-encoders fuse audio and video while reducing interferences.
- Incorporated into an RNN, the deep architecture outperforms recent detectors.

We address the problem of voice activity detection in difficult acoustic environments including high levels of noise and transients, which are common in real life scenarios. We consider a multimodal setting, in which the speech signal is captured by a microphone, and a video camera is pointed at the face of the desired speaker. Accordingly, speech detection translates to the question of how to properly fuse the audio and video signals, which we address within the framework of deep learning. Specifically, we present a neural network architecture based on a variant of auto-encoders, which combines the two modalities, and provides a new representation of the signal, in which the effect of interferences is reduced. To further encode differences between the dynamics of speech and interfering transients, the signal, in this new representation, is fed into a recurrent neural network, which is trained in a supervised manner for speech detection. Experimental results demonstrate improved performance of the proposed deep architecture compared to competing multimodal detectors.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Signal Processing - Volume 142, January 2018, Pages 69-74

نویسندگان

Ido Ariav, David Dov, Israel Cohen,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک معماری عمیق برای تشخیص فعالیت صوتی و تصویری در حضور گذرا

دسترسی سریع

ارتباط

English Website