کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
383188 | 660807 | 2016 | 14 صفحه PDF | دانلود رایگان |
• A proposal for online feature selection is proposed.
• The proposed pipeline covers discretization, feature selection and classification.
• Classical algorithms were modified to make them work online.
• K-means discretizer, Chi-Square filter and Artificial Neural Networks were used.
• Results show that classification error is decreasing, adapting to the arrival of new data.
With the advent of Big Data, data is being collected at an unprecedented fast pace, and it needs to be processed in a short time. To deal with data streams that flow continuously, classical batch learning algorithms cannot be applied and it is necessary to employ online approaches. Online learning consists of continuously revising and refining a model by incorporating new data as they arrive, and it allows important problems such as concept drift or management of extremely high-dimensional datasets to be solved. In this paper, we present a unified pipeline for online learning which covers online discretization, feature selection and classification. Three classical methods—the k-means discretizer, the χ2 filter and a one-layer artificial neural network—have been reimplemented to be able to tackle online data, showing promising results on both synthetic and real datasets.
Journal: Expert Systems with Applications - Volume 55, 15 August 2016, Pages 532–545