کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558221 1451691 2016 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Detecting paralinguistic events in audio stream using context in features and probabilistic decisions
ترجمه فارسی عنوان
تشخیص رویداد های شبه زبانی در جریان صوتی با استفاده از زمینه در ویژگی ها و تصمیمات احتمالی
کلمات کلیدی
رویداد شبه زبانی؛ خنده؛ پر کننده؛ صاف کردن احتمال؛ پوشش احتمال
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We present a sequential algorithm for detecting laughters and fillers in speech.
• The algorithm performs stepwise probability prediction, context inclusion & masking.
• We test several architectures for each of the above steps.
• Our models are more sensitive to change in feature carrying higher predictive power.

Non-verbal communication involves encoding, transmission and decoding of non-lexical cues and is realized using vocal (e.g. prosody) or visual (e.g. gaze, body language) channels during conversation. These cues perform the function of maintaining conversational flow, expressing emotions, and marking personality and interpersonal attitude. In particular, non-verbal cues in speech such as paralanguage and non-verbal vocal events (e.g. laughters, sighs, cries) are used to nuance meaning and convey emotions, mood and attitude. For instance, laughters are associated with affective expressions while fillers (e.g. um, ah, um) are used to hold floor during a conversation. In this paper we present an automatic non-verbal vocal events detection system focusing on the detect of laughter and fillers. We extend our system presented during Interspeech 2013 Social Signals Sub-challenge (that was the winning entry in the challenge) for frame-wise event detection and test several schemes for incorporating local context during detection. Specifically, we incorporate context at two separate levels in our system: (i) the raw frame-wise features and, (ii) the output decisions. Furthermore, our system processes the output probabilities based on a few heuristic rules in order to reduce erroneous frame-based predictions. Our overall system achieves an Area Under the Receiver Operating Characteristics curve of 95.3% for detecting laughters and 90.4% for fillers on the test set drawn from the data specifications of the Interspeech 2013 Social Signals Sub-challenge. We perform further analysis to understand the interrelation between the features and obtained results. Specifically, we conduct a feature sensitivity analysis and correlate it with each feature's stand alone performance. The observations suggest that the trained system is more sensitive to a feature carrying higher discriminability with implications towards a better system design.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 72–92
نویسندگان
, , , ,