کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392582 664991 2016 25 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances
ترجمه فارسی عنوان
شناسایی فضای ورودی و مفهوم هدف، راندگی در جریان داده ها با نمونه های به سختی برچسب و بدون برچسب
کلمات کلیدی
طبقه بندی داده ها فضای ورودی و مفهوم هدف، راندگی، تشخیص رانش، جریانهای ناقص و بدون برچسب، شاخص های عملکرد نیمه نظارت و کنترل نشده، فیلتر یادگیری فعال تنها در گذر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

In classification-based stream mining, drift detection is essential in order to (i) inform operators when unintended system changes occur and (ii) make classifier updates more flexible when changes are intentional. Current detection approaches usually rely on the assumption that fully supervised labeled streams are available for monitoring (the changes in) classifier performance. This is an unrealistic scenario in many on-line real-world applications as true class labels would have to be known, which usually requires tedious feedback efforts of operators working with the systems. We propose two techniques to improve economy and applicability of current drift detection techniques: (i) a semi-supervised approach that employs single-pass active learning filters to select the most interesting samples for supervising classifier performance and (ii) a fully unsupervised approach based on the degree of overlap between a classifier’s output certainty distributions that can be applied to any unlabeled classification stream. For both variants, a specific handling of imbalanced class distributions in the streams is proposed, which allows also possible downtrends in classifier behavior for under-represented classes to be observed. The statistical monitoring of classifier behavior relies on a modified version of the Page-Hinkley test, where a fading factor and an automatic thresholding concept (based on the Hoeffding bound) were introduced to render it more flexible for detecting successive drift occurrences in a stream. We compared our approaches to the fully supervised variant in two real-world on-line applications, including a systematic analysis of the capabilities of our methods. The semi-supervised approach was able to detect real as well as artificially built-in drifts in these streams with a similar delay (of about 5–6 min) as the supervised variant, and this with only 20% actively selected samples. The unsupervised variant was able to detect input space drifts with reasonable delays as well, but failed to detect target concept drifts — using both approaches in tandem therefore allows us to distinguish between input space and target concept drifts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 355–356, 10 August 2016, Pages 127–151
نویسندگان
, , , , ,