کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
494810 862808 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mixing numerical and categorical data in a Self-Organizing Map by means of frequency neurons
ترجمه فارسی عنوان
ترکیب داده های عددی و قطعی در یک نقشه سازماندهی خود با استفاده از نورون های فرکانس
کلمات کلیدی
نقشه خودمراقبتی داده های طبقه بندی شده داده های مختلط، اطلاعات بزرگ
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Self-Organizing Maps (SOMs) are powerful tools with many applications. Nevertheless, they cannot deal directly with categorical variables.
• In order to present categorical variables to SOMs, they are usually transformed by binarization. This increases dramatically the dataset dimensionality.
• NCSOM has been presented in order to cope with categorical or mixed data. However, it presents some drawbacks: categorical and numerical variables are not equally balanced and the method is not convergent.
• A novel SOM variant, called FMSOM, is presented which is able to deal with numerical and categorical variables, giving the same weight to them and ensuring convergence. A scalable implementation of the method is fully described.
• FMSOM is applied to a benchmark of well known datasets, composed of categorical and mixed data. The results show the potential of the method to analyze this kind of datasets.

Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 36, November 2015, Pages 246–254
نویسندگان
, , , , , ,