کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6857208 661905 2016 27 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Real-time stream data mining based on CanTree and Gtree
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Real-time stream data mining based on CanTree and Gtree
چکیده انگلیسی
We face an increasing need to discover knowledge from data streams in real-time. Real-time stream data mining needs a compact data structure to store transactions in the recent sliding-window by one scan, and an efficient algorithm to discover frequent itemsets from the compact data structure. In this paper, we propose a novel data mining algorithm, called CanTree-GTree, which discovers the complete frequent itemsets from real-time transactions based on sliding-windows. The algorithm uses two data structures: CanTree and GTree. CanTree compactly represents all transactions in a sliding-window by one scan, and serves as a base-tree. The algorithm efficiently maintains the base-tree by adding new transactions and removing old transactions without any reconstruction phases. A novel data structure, called GTree (Group Tree), serves as a projection-tree for each data item. The algorithm traverses each node of the base-tree only once by using a top-down tree traversal method to build the projection-tree, and discovers frequent itemsets by low processing cost. The proposed algorithm is therefore effective for discovering frequent itemsets in real-time stream data. Our performance evaluation experiments with other algorithms based on CPSTree and CanTree-FPTree show that our algorithm outperforms the other algorithms in the synthetic data set by about 35% and 26% of run-time cost, respectively. Also, we confirm that the proposed algorithm shows excellent results on real-world data sets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 367–368, 1 November 2016, Pages 512-528
نویسندگان
, ,