کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4944419 | 1437993 | 2017 | 36 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Cost-sensitive elimination of mislabeled training data
ترجمه فارسی عنوان
از بین بردن حساس بودن به اطلاعات آموزشی نامناسب
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
فیلترینگ داده غلط املایی، حساس به هزینه یادگیری گروهی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
هوش مصنوعی
چکیده انگلیسی
Accurately labeling training data plays a critical role in various supervised learning tasks. Since labeling in practical applications might be erroneous due to various reasons, a wide range of algorithms have been developed to eliminate mislabeled data. These algorithms may make the following two types of errors: identifying a noise-free data as mislabeled, or identifying a mislabeled data as noise free. The effects of these errors may generate different costs, depending on the training datasets and applications. However, the cost variations are usually ignored thus existing works are not optimal regarding costs. In this work, the novel problem of cost-sensitive mislabeled data filtering is studied. By wrapping a cost-minimizing procedure, we propose the prototype cost-sensitive ensemble learning based mislabeled data filtering algorithm, named CSENF. Based on CSENF, we further propose two novel algorithms: the cost-sensitive repeated majority filtering algorithm CSRMF and cost-sensitive repeated consensus filtering algorithm CSRCF. Compared to CSENF, these two algorithms could estimate the mislabeling probability of each training data more confidently. Therefore, they produce less cost compared to CSENF and cost-blind mislabeling filters. Empirical and theoretical evaluations on a set of benchmark datasets illustrate the superior performance of the proposed methods.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 402, September 2017, Pages 170-181
Journal: Information Sciences - Volume 402, September 2017, Pages 170-181
نویسندگان
Donghai Guan, Weiwei Yuan, Tinghuai Ma, Asad Masood Khattak, Francis Chow,