کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4944366 1437984 2017 35 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On-line active learning: A new paradigm to improve practical useability of data stream modeling methods
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
On-line active learning: A new paradigm to improve practical useability of data stream modeling methods
چکیده انگلیسی
The central purpose of this survey is to provide readers an insight into the recent advances and challenges in on-line active learning. Active learning has attracted the data mining and machine learning community since around 20 years. This is because it served for important purposes to increase practical applicability of machine learning techniques, such as (i) to reduce annotation and measurement costs for operators and measurement equipments, (ii) to reduce manual labeling effort for experts and (iii) to reduce computation time for model training. Almost all of the current techniques focus on the classical pool-based approach, which is off-line by nature as iterating over a pool of (unlabeled) reference samples a multiple times to choose the most promising ones for improving the performance of the classifiers. This is achieved by (time-intensive) re-training cycles on all labeled samples available so far. For the on-line, stream mining case, the challenge is that the sample selection strategy has to operate in a fast, ideally single-pass manner. Some first approaches have been proposed during the last decade (starting from around 2005) with the usage of machine learning (ML) oriented incremental classifiers, which are able to update their parameters based on selected samples, but not their structures. Since 2012, on-line active learning concepts have been proposed in connection with the paradigm of evolving models, which are able to expand their knowledge into feature space regions so far unexplored. This opened the possibility to address a particular type of uncertainty, namely that one which stems from a significant novelty content in streams, as, e.g., caused by drifts, new operation modes, changing system behaviors or non-stationary environments. We will provide an overview about the concepts and techniques for sample selection and active learning within these two principal major research lines (incremental ML models versus evolving systems), a comparison of their essential characteristics and properties (raising some advantages and disadvantages), and a study on possible evaluation techniques for them. We conclude with an overview of real-world application examples where various on-line AL approaches have been already successfully applied in order to significantly reduce user's interaction efforts and costs for model updates.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 415–416, November 2017, Pages 356-376
نویسندگان
,