کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6854297 1437411 2018 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions
ترجمه فارسی عنوان
الگوریتم های کارآمد برای نشان دادن مختصر مختصری از الگوهای پیوسته بر اساس شرایط برش بلند به طور همزمان
کلمات کلیدی
معادله الگوی متوالی، دنباله مکرر، توالی حداکثر مکرر، توالی بسته مکرر، توالی ژنراتور مکرر، فرمت داده عمودی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The concise representations of sequential patterns, including maximal sequential patterns, closed sequential patterns and sequential generator patterns, play an important role in data mining since they provide several benefits when compared to sequential patterns. One of the most important benefits is that their cardinalities are generally much less than the cardinality of the set of sequential patterns. Therefore, they can be mined more efficiently, use less storage space, and it is easier for users to analyze the information provided by the concise representations. In addition, the set of all maximal sequential patterns can be utilized to recover the complete set of sequential patterns, while closed sequential patterns and sequential generators can be used together to generate non-redundant sequential rules and to quickly recover all sequential patterns and their frequencies. Several algorithms have been proposed to mine the concise representations separately, i.e., each of them has been designed to discover only a type of the concise representation. However, they remain time-consuming and memory intensive tasks. To address this problem, we propose three novel efficient algorithms named FMaxSM, FGenCloSM and MaxGenCloSM to exploit only maximal sequential patterns, to simultaneously mine both the sets of closed sequential patterns and generators, and to discover all three concise representations during the same process. To our knowledge, MaxGenCloSM is the first algorithm for concurrently mining the three concise representations of sequential patterns. The proposed algorithms are based on two novel local pruning strategies called LPMAX and LPMaxGenClo that are designed to prune non-maximal, non-closed and non-generator patterns earlier and more efficiently at two and three successive levels of the prefix tree without subsequence relation checking. Extensive experiments on real-life and synthetic databases show that FMaxSM, FGenCloSM and MaxGenCloSM are up to two orders of magnitude faster than the state-of-the-art algorithms and that the proposed algorithms consume much less memory, especially for low minimum support thresholds and for dense databases.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 67, January 2018, Pages 197-210
نویسندگان
, , ,