کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
394283 665789 2012 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
چکیده انگلیسی

Most data stream classification algorithms need to supply input with a large amount of precisely labeled data. However, in many data stream applications, streaming data contains inherent uncertainty, and labeled samples are difficult to be collected, while abundant data are unlabeled. In this paper, we focus on classifying uncertain data streams with only positive and unlabeled samples available. Based on concept-adapting very fast decision tree (CVFDT) algorithm, we propose an algorithm namely puuCVFDT (CVFDT for positive and unlabeled uncertain data). Experimental results on both synthetic and real-life datasets demonstrate the strong ability and efficiency of puuCVFDT to handle concept drift with uncertainty under positive and unlabeled learning scenario. Even when 90% of the samples in the stream are unlabeled, the classification performance of the proposed algorithm is still compared to that of CVFDT, which is learned from fully labeled data without uncertainty.


► We propose uncertain information gain for positive and unlabeled samples (puuIG).
► We give methods to summarize imprecise values into some distributions.
► These distributions can be used to calculate puuIG efficiently.
► We propose probabilistic Hoeffding bound to build the very fast decision trees.
► Our algorithm can learn well from positive and unlabeled samples with uncertainty.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 213, 5 December 2012, Pages 50–67
نویسندگان
, , , ,