کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6854992 1437602 2018 30 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A pure array structure and parallel strategy for high-utility sequential pattern mining
ترجمه فارسی عنوان
یک ساختار آرایه خالص و استراتژی موازی برای معادلات الگوریتم پیاده سازی با کارایی بالا
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
High-utility sequential pattern mining (HUSPM) is the task of discovering all sequential patterns in a sequence database whose utility values are equal to or greater than a given minimum utility threshold. HUSPM has become increasingly important in many real-world data mining applications, such as market basket data analysis, weblog mining, and bio-medical gene data analysis, which considers co-occurrence values and quantity, utility (e.g., profit or cost) and time. Current approaches in the literature for HUSPM use the utility matrix to store a sequence database in the memory. Unfortunately, the utility matrix consumes a large amount of main memory. To address this issue, we introduce a pure array structure that reduces the memory consumption when compared to the utility matrix. In addition, HUSPM is also challenged with the downward closure property (DCP) to prune the search space. Recently, HUSPM algorithms have used the upper bound of utility values as the DCP. However, it is usually higher than the actual utility of patterns. Thus, these algorithms may generate many candidate patterns. The large search space leads to poor performance due to excessive runtime and memory usage. One of the reasons is the number of candidate patterns is proportional to the number of requisite projected database scans for calculating their actual utilities. In this paper, we present a novel pruning strategy that efficiently prunes non-HUSPs and significantly reduces the search space compared to the state-of-the-art HUS-Span algorithm. Moreover, we propose a parallel strategy to speed up the mining process. Then, we propose two algorithms which are the pure Array structure for High-utility Sequential (AHUS) pattern mining and AHUS parallel mining (AHUS-P). The AHUS-P algorithm uses shared memory to parallelize the mining process. It concurrently identifies HUSPs based on the advantages of the multi-core processor architecture. The experimental results show that AHUS and AHUS-P can efficiently and effectively discover all HUSPs. Both the proposed algorithms outperform the state-of-the-art HUS-Span algorithm in terms of runtime, memory usage, and scalability for all experimental datasets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 104, 15 August 2018, Pages 107-120
نویسندگان
, , ,