Article ID Journal Published Year Pages File Type
4944646 Information Sciences 2017 16 Pages PDF
Abstract
In this paper we propose a new approach for designing an ensemble applied to stream data classification. Our approach is supported by two theorems showing how to decide whether a new component should be added to the ensemble or not, based on the assumption that such an action should increase the accuracy of the ensemble not only for the current portion of observations but also for the whole (infinite) data stream. The conclusions of these theorems hold with a certain probability (confidence) set by the user. Through computer simulations, among others, we show that decreasing the confidence that decision based on the finite portion of the stream is the same as based on the whole (infinite) data stream only slightly improves the accuracy at the expense of significant memory consumption. Moreover, we will introduce a novel procedure of weighting ensemble components, i.e. decision trees, by assigning a weight to each leaf of the tree. In previous approaches a weight was assigned to the whole ensemble component. The new approach is based on the observation that probability of the correct tree outcome is different in various tree sections.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,