کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392642 665145 2016 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
ترجمه فارسی عنوان
الگوریتم خوشه بندی سریع جریان داده مبتنی بر تراکم با مراکز خوشه ای برای داده های مخلوط خود تعیین می شود
کلمات کلیدی
داده کاوی، ویژگی های مختلط، خوشه جریان داده ها، شدت میدان پیک، متریک اندازه گیری فاصله
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Most data streams encountered in real life are data objects with mixed numerical and categorical attributes. Currently most data stream algorithms have shortcomings including low clustering quality, difficulties in determining cluster centers, poor ability for dealing with outliers’ issue. A fast density-based data stream clustering algorithm with cluster centers automatically determined in the initialization stage is proposed. Based on data attribute relationships analysis, mixed data sets are filed into three types whose corresponding distance measure metrics are designed. Based on field intensity-distance distribution graph for each data object, linear regression model and residuals analysis are used to find the outliers of the graph, enabling cluster centers automatic determination. After the cluster centers are found, all data objects can be clustered according to their distance with centers. The data stream clustering algorithm adopts an online/offline two-stage processing framework, and a new micro cluster characteristic vector to maintain the arriving data objects dynamically. Micro clusters decay function and deletion mechanism of micro clusters are used to maintain the micro clusters, which reflects the data stream evolution process accurately. Finally, the performances of the proposed algorithm are testified by a series of experiments on real-world mixed data sets in comparison with several outstanding clustering algorithms in terms of the clustering purity, efficiency and time complexity.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 345, 1 June 2016, Pages 271–293
نویسندگان
, ,