Article ID Journal Published Year Pages File Type
6857208 Information Sciences 2016 27 Pages PDF
Abstract
We face an increasing need to discover knowledge from data streams in real-time. Real-time stream data mining needs a compact data structure to store transactions in the recent sliding-window by one scan, and an efficient algorithm to discover frequent itemsets from the compact data structure. In this paper, we propose a novel data mining algorithm, called CanTree-GTree, which discovers the complete frequent itemsets from real-time transactions based on sliding-windows. The algorithm uses two data structures: CanTree and GTree. CanTree compactly represents all transactions in a sliding-window by one scan, and serves as a base-tree. The algorithm efficiently maintains the base-tree by adding new transactions and removing old transactions without any reconstruction phases. A novel data structure, called GTree (Group Tree), serves as a projection-tree for each data item. The algorithm traverses each node of the base-tree only once by using a top-down tree traversal method to build the projection-tree, and discovers frequent itemsets by low processing cost. The proposed algorithm is therefore effective for discovering frequent itemsets in real-time stream data. Our performance evaluation experiments with other algorithms based on CPSTree and CanTree-FPTree show that our algorithm outperforms the other algorithms in the synthetic data set by about 35% and 26% of run-time cost, respectively. Also, we confirm that the proposed algorithm shows excellent results on real-world data sets.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,