کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4966486 867089 2016 36 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Utilising a statistical inequality for efficiently finding term sets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Utilising a statistical inequality for efficiently finding term sets
چکیده انگلیسی
Information Retrieval (IR) systems aim to find sets of terms that discriminate documents and often exploit frequency as an evidence that signals a non-random set of terms. Frequent Itemset (FI) mining refers to a class of algorithms that can be applied to IR to find non-random set of terms. Finding FIs is a very expensive computational task because of the exponential number of itemsets. To reduce this cost, many approaches to mining FIs are based on the monotonicity property that an itemset is frequent only if all its subsets are frequent. However, it is still uncertain whether an itemset is frequent if all its subsets are frequent, thus requiring additional scans and eventually computational cost. We introduce a statistical inequality called Bell-Wigner Inequality (BWI) as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. Using both data mining datasets and a large IR test collection, an empirical validation shows that the BWI can significantly reduce computational cost.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 52, Issue 6, November 2016, Pages 1086-1121
نویسندگان
,