Article ID Journal Published Year Pages File Type
4944956 Information Sciences 2017 17 Pages PDF
Abstract
Annotation by the crowd workers serving online is gaining focus in recent years in diverse fields due to its distributed power of problem solving. Distributing the labeling task among a large set of workers (may be experts or non-experts) and obtaining the final consensus is a popular way of performing large-scale annotation in a limited time. Collection of multiple annotations can be effective for annotation of large-scale datasets for applications like natural language processing, image processing, etc. However, as the crowd workers are not necessarily experts, their opinions might not be accurate enough. This causes problem in deriving the final aggregated judgment. Again, majority voting (MV) is not suitable for such problems because the number of annotators is limited and they have multiple options to choose. This might cause too much conflicts among the opinions provided. Additionally, there might exist annotators who randomly try to annotate (provide spam opinions for) too many questions to maximize their payment. This can incorporate noise while deriving the final judgment. In this paper, we address the problem of crowd judgment analysis in an unsupervised way and a biclustering-based approach is proposed to obtain the judgments appropriately. The effectiveness of this approach is demonstrated on four publicly available small-scale Amazon Mechanical Turk datasets, along with a large-scale CrowdFlower dataset. We also compare the algorithm with MV and some other existing algorithms. In most of the cases the proposed approach is competitively better than others. But most importantly, it does not use the entire dataset for deriving the judgment.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,