Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6902073 | Procedia Computer Science | 2017 | 8 Pages |
Abstract
Crowdsourcing is an emerging collaborative approach that can be used for effective annotations of linguistic resources. There are many crowdsourcing genres: paid-for, games with a purpose, or altruistic (volunteer-based) approaches. In this paper, we investigate the use of altruistic crowdsourcing for speech corpora annotation by narrating our experience of validating a semi-automatic task for dialect annotation of Kalam'DZ, a corpus dedicated to Arabic Algerian dialectal varieties. We start by describing the whole process of designing altruistic crowdsourcing project. Using the unpaid crowdcrafting platform, we have performed experiments on a sample of 10% of Kalam'DZ corpus, totaling more than 10 h with 1012 speakers. The evaluation of this crowdsourcing job is ensured through a comparison with a gold standard annotation done by experts which affirms a high level of inter-annotation agreements of 81%. Our results confirm that altruistic crowdsourcing is an effective approach for speech dialect annotation. In addition, we present a set of best practices for altruistic crowdsourcing for corpus annotations.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Soumia Bougrine, Hadda Cherroun, Ahmed Abdelali,