کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
424855 | 685650 | 2016 | 20 صفحه PDF | دانلود رایگان |
• We propose sliding window based stream pattern mining algorithm that finds weighted erasable patterns.
• We devise strategies for pruning techniques guaranteeing efficiency of the proposed algorithm.
• We suggest performance improving stream data mining techniques for the stream data applications.
• We provide extensive, comprehensive performance evaluation results.
As one of the variations in frequent pattern mining, erasable pattern mining discovers patterns with benefits lower than or equal to a user-specified threshold from a product database. Although traditional erasable pattern mining algorithms can perform their own mining operations on static mining environments, they are not suitable for dealing with dynamic data stream environments. In such dynamic data streams, algorithms have to process them immediately with only one database scan in order to consider characteristics of data stream mining. However, previous tree-based erasable pattern mining methods have difficulty in processing dynamic data streams because they need two or more database scans to construct their own tree structures. In addition, they do not also consider specific information of each item within a product database, but they need to conduct mining operations considering such additional information of the items in order to find more useful erasable pattern results. For this reason, in this paper, we propose a weighted erasable pattern mining algorithm suitable for sliding window-based data stream environments. The algorithm employs tree and list data structures for more efficient mining processes and solves the problems of previous erasable pattern mining approaches by using a sliding window-based stream processing technique and an item weight-based pattern pruning method. We compare performance of the proposed algorithm to state-of-the-art tree-based approaches with respect to various real and synthetic datasets. Experimental results show that our method is more efficient and scalable than the competitors in terms of runtime, memory, and pattern generation.
Journal: Future Generation Computer Systems - Volume 59, June 2016, Pages 1–20