کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558214 1451691 2016 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories
ترجمه فارسی عنوان
حفظ اطلاعات تبعیض آمیز در مورد دسته بندی های تلفن با بازنمایی‌های ژست مانند اطلاعات به دست آمده به طور مستقیم شمرده شمرده
کلمات کلیدی
ارتباط گفتار. شکل‌های هندسی اولیه جنبش؛ طبقه بندی تلفن؛ نظریه حرکتی؛ انتقال اطلاعات
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Proposed a method to extract sparse gesture-like primitives from articulatory data.
• Learnt primitive movements for different phonemes using a weak supervision step.
• Demonstrated that primitives for different phones are linguistically interpretable.
• Proposed and evaluated features on an interval-based phone classification task.
• Showed that purely production-based primitives perform well for phone classification.

How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step toward answering the complementary question of whether speakers’ articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of “primitive movements” of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e., representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 330–346
نویسندگان
, , ,