A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6883413	1444172	2018	21 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Pattern recognition - بازشناخت الگو k-Means - میانگین ـ کی Unsupervised learning - یادگیری بدون نظارت

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach

چکیده انگلیسی

Partitional clustering algorithms represent an interesting issue in pattern recognition due to their high scalability and efficiency. The k-means, proposed since 1965, had shown great efficiency for numeric clustering but is unfortunately inadequate for categorical clustering. In 1998, the k-modes was proposed as an extension of the k-means to cluster categorical datasets. In this paper, a new categorical method based on partitions called Manhattan Frequency k-Means (MFk-M) is detailed. It aims to convert the initial categorical data into numeric values using the relative frequency of each modality in the attributes. The L1 (Manhattan distance) norm was also used as an evaluation distance measure to compute the distance between the observations and the centroids. Finally, an approximation is defined to evaluate each resulting partition during the execution of the algorithm to avoid trivial clusterings such as cluster death. Experimental analysis performed on real life datasets highlights the reduced complexity costs and high efficiency of our proposal when compared to the standard k-means and k-modes algorithms.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Electrical Engineering - Volume 68, May 2018, Pages 463-483

نویسندگان

Semeh Ben Salem, Sami Naouali, Zied Chtourou,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach

دسترسی سریع

ارتباط

English Website