Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
485433	703327	2016	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Crowdsourcing - جمع‌ سپاری، انبوه‌ سپاری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing

چکیده انگلیسی

Scarcity of resources in under resourced languages may leave these languages behind in race of development of data driven NLP systems. Crowdsourcing has come up as a technique to bridge this gap, as it offers approach for collecting such resources in collaborative manner. Though some of Indian languages are widely spoken throughout the world yet many of them are resource poor when it is measured in terms of availability of transcribed and annotated resources for building reliable data driven systems. This paper describes an experience of speech data collection for Hindi through mobile using this approach for building automatic speech recognition and other speech based retrieval systems. This approach covers a lot of variety in terms of microphones and surrounding environment etc. Besides cost saving and speedy data collection it offers the advantage of adaptation of the framework for collecting different types of resources for various applications in language independent manner like word sense disambiguation, Named Entity Recognition, Sentiment Analysis etc. Experiences, analysis and challenges faced in recordings of more than 100 speakers are reported.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 81, 2016, Pages 37–44

نویسندگان

Sunita Arora, Karunesh Kumar Arora, Mukund Kumar Roy, S.S. Agrawal, B.K. Murthy,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing

دسترسی سریع

ارتباط

English Website