کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6861777 1439258 2018 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Analysis of training data using clustering to improve semi-supervised self-training
ترجمه فارسی عنوان
تجزیه و تحلیل داده های آموزش با استفاده از خوشه بندی برای بهبود نیمه نظارت خود آموزش
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Applying unlabeled data in semi-supervised self-training can significantly improve the accuracy of a supervised classifier, but in some cases, it may dramatically decrease the classification accuracy. One reason for such degradation is a lack of labeled data for training an initial classifier in the self-training process. In this paper, we propose a method to determine the sufficiency of the labeled data and two methods to improve the labeled dataset in the insufficient portion. To determine the sufficiency of labeled data, we apply a semi-supervised cluster technique to estimate the labeled data distribution over the training set. The results show that the accuracy obtained from the final classifiers in clusters without labeled data is markedly lower than that obtained from clusters with labeled data. The two methods we propose for improving the labeled dataset are active labeling and co-labeling, for ensuring the sufficiency of labeled data. Comparison experiments on UCI and real-world datasets show that the proposed methods are an effective preprocessing step for determining and obtaining a sufficient quantity of labeled data, which is essential for attaining accuracy in a semi-supervised self-training classifier.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 143, 1 March 2018, Pages 65-80
نویسندگان
, ,