کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4977870 1452016 2016 25 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Effective word count estimation for long duration daily naturalistic audio recordings
ترجمه فارسی عنوان
برآورد کل تعداد کل موثر برای طولانی مدت ضبط های صوتی طبیعی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
The ability to count words in extended audio sequences allows researchers to explore characteristics of speakers (i.e., leading, following, task responsibility, personal engagement), as well as the dynamics of two-way or multi-subject conversation scenarios. As such, counting the number of words spoken by a person, offers a rich information source for several applications such as health monitoring (e.g., Autism, Parkinson's, Alzheimer's and etc), second language learning, or language development studies. However, developing robust word count systems that can achieve high performance with low computational cost is very challenging due to the uncertain and dynamic behavior experienced in audio recordings. In this study, we address the problem for large-scale naturalistic audio recordings based on a 100-day audio collection entitled (i.e., Prof-Life-Log). This corpus contains continuously recorded audio from one person using a mobile LENA audio recording device (LENA, 2015). The device captures audio for an entire workday which can last up to 16 hours. Our proposed framework to address word count consists of five main components, (i) Speech Activity Detection(SAD) to remove non-speech parts of the signal, (ii) Speech Enhancement to suppress the effects of background noise, (iii) Primary vs. Secondary Speaker Detection to remove secondary speaker segments, (iv) Syllable Rate Estimation to estimate the syllable rate for the primary speaker, and (v) Linear Minimum Mean Square Error Estimation (LMMSE) to find the linear mapping between syllable rate and word rate in spontaneous speech. In spite of the simplicity of the framework, it shows to be very effective in real scenarios with good performance on various datasets. As an indication of performance, the error of the framework for an entire 16 h day audio file can be as low as 1% in terms of cumulative Word Count Error.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 84, November 2016, Pages 15-23
نویسندگان
, , ,