Separate or joint? Estimation of multiple labels from crowdsourced annotations

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
382884	660796	2014	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Human computation - محاسبه انسانی Label dependency - وابستگی به برچسب quality control - کنترل کیفیت

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Separate or joint? Estimation of multiple labels from crowdsourced annotations

چکیده انگلیسی

• We propose estimating multiple true labels for multi-labeled instances.
• We flexibly incorporate label dependency into the label-generation process.
• It is effective to simultaneously estimate the states of strongly related labels.
• Reliable results are estimated using the opinions of a few crowdsourcing workers.
• Our models can reduce the cost of multi-label data collection for AI techniques.

Artificial intelligence techniques aimed at more naturally simulating human comprehension fit the paradigm of multi-label classification. Generally, an enormous amount of high-quality multi-label data is needed to form a multi-label classifier. The creation of such datasets is usually expensive and time-consuming. A lower cost way to obtain multi-label datasets for use with such comprehension–simulation techniques is to use noisy crowdsourced annotations. We propose incorporating label dependency into the label-generation process to estimate the multiple true labels for each instance given crowdsourced multi-label annotations. Three statistical quality control models based on the work of Dawid and Skene are proposed. The label-dependent DS (D-DS) model simply incorporates dependency relationships among all labels. The label pairwise DS (P-DS) model groups labels into pairs to prevent interference from uncorrelated labels. The Bayesian network label-dependent DS (ND-DS) model compactly represents label dependency using conditional independence properties to overcome the data sparsity problem. Results of two experiments, “affect annotation for lines in story” and “intention annotation for tweets”, show that (1) the ND-DS model most effectively handles the multi-label estimation problem with annotations provided by only about five workers per instance and that (2) the P-DS model is best if there are pairwise comparison relationships among the labels. To sum up, flexibly using label dependency to obtain multi-label datasets is a promising way to reduce the cost of data collection for future applications with minimal degradation in the quality of the results.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 13, 1 October 2014, Pages 5723–5732

نویسندگان

Lei Duan, Satoshi Oyama, Haruhiko Sato, Masahito Kurihara,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Separate or joint? Estimation of multiple labels from crowdsourced annotations

دسترسی سریع

ارتباط

English Website