Extending fuzzy and probabilistic clustering to very large data sets

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
416807	681403	2006	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

goodness-of-fit - برازش Extensibility - توسعه پذیری Clustering - خوشه بندی Data mining - داده‌کاوی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Extending fuzzy and probabilistic clustering to very large data sets

چکیده انگلیسی

Approximating clusters in very large (VL=unloadable) data sets has been considered from many angles. The proposed approach has three basic steps: (i) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (ii) clustering the sample with a literal (or exact) algorithm; and (iii) non-iterative extension of the literal clusters to the remainder of the data set. Extension accelerates clustering on all (loadable) data sets. More importantly, extension provides feasibility—a way to find (approximate) clusters—for data sets that are too large to be loaded into the primary memory of a single computer. A good generalized sampling and extension scheme should be effective for acceleration and feasibility using any extensible clustering algorithm. A general method for progressive sampling in VL sets of feature vectors is developed, and examples are given that show how to extend the literal fuzzy (cc-means) and probabilistic (expectation-maximization) clustering algorithms onto VL data. The fuzzy extension is called the generalized extensible fast fuzzy cc-means (geFFCM) algorithm and is illustrated using several experiments with mixtures of five-dimensional normal distributions.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 51, Issue 1, 1 November 2006, Pages 215–234

نویسندگان

Richard J. Hathaway, James C. Bezdek,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Extending fuzzy and probabilistic clustering to very large data sets

دسترسی سریع

ارتباط

English Website