کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
485433 703327 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing
چکیده انگلیسی

Scarcity of resources in under resourced languages may leave these languages behind in race of development of data driven NLP systems. Crowdsourcing has come up as a technique to bridge this gap, as it offers approach for collecting such resources in collaborative manner. Though some of Indian languages are widely spoken throughout the world yet many of them are resource poor when it is measured in terms of availability of transcribed and annotated resources for building reliable data driven systems. This paper describes an experience of speech data collection for Hindi through mobile using this approach for building automatic speech recognition and other speech based retrieval systems. This approach covers a lot of variety in terms of microphones and surrounding environment etc. Besides cost saving and speedy data collection it offers the advantage of adaptation of the framework for collecting different types of resources for various applications in language independent manner like word sense disambiguation, Named Entity Recognition, Sentiment Analysis etc. Experiences, analysis and challenges faced in recordings of more than 100 speakers are reported.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 81, 2016, Pages 37–44
نویسندگان
, , , , ,