کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
405237 677510 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Parallel frequent itemset mining using systolic arrays
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Parallel frequent itemset mining using systolic arrays
چکیده انگلیسی

Since extraction of frequent itemsets from a transaction database is crucial to several data mining tasks such as association rule generation, so frequent itemset mining is one of the most important concepts in data mining. One of the major problems in frequent itemset mining is the explosion of the number of results which is directly effecting on the execution time of itemset mining algorithms. To address this problem, closed itemsets have been proposed, which provides concise lossless representations of the original collection of frequent itemsets. Henceforth, the frequencies of all itemsets in the original collection can be reconstructed from the reduced collection. However, the reduction provided by this exact method is not sufficient to solve the pattern explosion problem, mainly because of high dimensional datasets which have large number of items in each transaction. Colossal itemset mining is another solution to reduce the output size which will not be useful if the set of all frequent itemsets have been required. Higher level of performance improvement can be obtained from efficient scalable parallel mining methods. In this paper we represent an efficient scalable parallel algorithm using systolic arrays to conduct mining of frequent itemsets in very large, such as high dimensional, datasets. In our algorithm, we use a bit matrix to compress the dataset and mapping the mining algorithm on the systolic arrays architecture. For this purpose, each transaction of dataset represents as a row in the bit matrix. We use this bit matrix structure to model the pattern mining as a systolic array problem. Our experimental results and performance study show that this algorithm outperforms substantially the best previously developed parallel algorithms.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 37, January 2013, Pages 462–471
نویسندگان
, ,