Utilising a statistical inequality for efficiently finding term sets

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4966486	867089	2016	36 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Term selection Efficiency - بازده Clustering - خوشه بندی Query expansion - گسترش پرس و جو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Utilising a statistical inequality for efficiently finding term sets

چکیده انگلیسی

Information Retrieval (IR) systems aim to find sets of terms that discriminate documents and often exploit frequency as an evidence that signals a non-random set of terms. Frequent Itemset (FI) mining refers to a class of algorithms that can be applied to IR to find non-random set of terms. Finding FIs is a very expensive computational task because of the exponential number of itemsets. To reduce this cost, many approaches to mining FIs are based on the monotonicity property that an itemset is frequent only if all its subsets are frequent. However, it is still uncertain whether an itemset is frequent if all its subsets are frequent, thus requiring additional scans and eventually computational cost. We introduce a statistical inequality called Bell-Wigner Inequality (BWI) as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. Using both data mining datasets and a large IR test collection, an empirical validation shows that the BWI can significantly reduce computational cost.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 52, Issue 6, November 2016, Pages 1086-1121

نویسندگان

Massimo Melucci,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Utilising a statistical inequality for efficiently finding term sets

دسترسی سریع

ارتباط

English Website