RSQRT: An heuristic for estimating the number of clusters to report

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
379816	659510	2012	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

E-commerce - تجارت الکترونیک Data analytics - تجزیه و تحلیل داده ها Clustering - خوشه بندی Bayesian information criterion - معیار اطلاعات بیزی Heuristic - هیوریستیک

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

RSQRT: An heuristic for estimating the number of clusters to report

چکیده انگلیسی

Clustering can be a valuable tool for analyzing large data sets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from the UCI machine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n2) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.

► We propose a heuristic RSQRT to estimate the number of clusters, K, in a data set.
► RSQRT applies well to numeric and nominal scale data attributes.
► Results correlate with BIC estimations for K and RSQRT has much a lower computational cost.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Electronic Commerce Research and Applications - Volume 11, Issue 2, March–April 2012, Pages 152–158

نویسندگان

John Carlis, Kelsey Bruso,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

RSQRT: An heuristic for estimating the number of clusters to report

دسترسی سریع

ارتباط

English Website