کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942717 1437418 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mining of frequent patterns with multiple minimum supports
ترجمه فارسی عنوان
استخراج الگوهای مکرر با حداقل پشتیبانی حداقل
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Frequent pattern mining (FPM) is an important topic in data mining for discovering the implicit but useful information. Many algorithms have been proposed for this task but most of them suffer from an important limitation, which relies on a single uniform minimum support threshold as the sole criterion to identify frequent patterns (FPs). Using a single threshold value to assess the usefulness of all items in a database is inadequate and unfair in real-life applications since each item is different and not all items should be treated as the same. Several algorithms have been developed for mining FPs with multiple minimum supports but most of them suffer from the time-consuming problem and require a large amount of memory. In this paper, we address this issue by introducing the novel approach named Frequent Pattern mining with Multiple minimum supports from the Enumeration-tree (FP-ME). In the developed Set-Enumeration-tree with Multiple minimum supports (ME-tree) structure, a new sorted downward closure (SDC) property of FPs and the least minimum support (LMS) concept with multiple minimum supports are used to effectively prune the search space. The proposed FP-ME algorithm can directly discover FPs from the ME-tree without candidate generation. Moreover, an improved algorithm, named FP-MEDiffSet, is also developed based on the DiffSet concept, to further increase mining performance. Substantial experiments on both real-life and synthetic datasets show that the proposed algorithms can not only avoid the “rare item problem”, but also efficiently and effectively discover the complete set of FPs in transactional databases while considering multiple minimum supports and outperform the state-of-the-art CFP-growth++ algorithm in terms of execution time, memory usage and scalability.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 60, April 2017, Pages 83-96
نویسندگان
, , , , ,