کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6902073 | 1446498 | 2017 | 8 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Altruistic Crowdsourcing for Arabic Speech Corpus Annotation
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
Crowdsourcing is an emerging collaborative approach that can be used for effective annotations of linguistic resources. There are many crowdsourcing genres: paid-for, games with a purpose, or altruistic (volunteer-based) approaches. In this paper, we investigate the use of altruistic crowdsourcing for speech corpora annotation by narrating our experience of validating a semi-automatic task for dialect annotation of Kalam'DZ, a corpus dedicated to Arabic Algerian dialectal varieties. We start by describing the whole process of designing altruistic crowdsourcing project. Using the unpaid crowdcrafting platform, we have performed experiments on a sample of 10% of Kalam'DZ corpus, totaling more than 10 h with 1012 speakers. The evaluation of this crowdsourcing job is ensured through a comparison with a gold standard annotation done by experts which affirms a high level of inter-annotation agreements of 81%. Our results confirm that altruistic crowdsourcing is an effective approach for speech dialect annotation. In addition, we present a set of best practices for altruistic crowdsourcing for corpus annotations.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 117, 2017, Pages 137-144
Journal: Procedia Computer Science - Volume 117, 2017, Pages 137-144
نویسندگان
Soumia Bougrine, Hadda Cherroun, Ahmed Abdelali,