کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4944445 1437990 2017 34 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fuzzy clustering of distributional data with automatic weighting of variable components
ترجمه فارسی عنوان
خوشه بندی فازی از داده های توزیع شده با وزن گیری خودکار مولفه های متغیر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Distributional data, expressed as realizations of distributional variables, are new types of data arising from several sources. In this paper, we present some new fuzzy c-means algorithms for data described by distributional variables. The algorithms use the L2 Wasserstein distance between distributions as dissimilarity measure. Usually, in fuzzy c-means, all the variables are considered equally important in the clustering task. However, some variables could be more or less important or even irrelevant for this task. Considering a decomposition of the squared L2 Wasserstein distance, and using the notion of adaptive distance, we propose some algorithms for automatically computing relevance weights associated with variables, as well as with their components. This is done for the whole dataset or cluster-wise. Relevance weights express the importance of each variable, or of each component, in the clustering process acting also as a variable selection method. Using artificial and real-world data, we observed that algorithms with automatic weighting of variables (or components) are better able to take into account the cluster structure of data.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 406–407, September 2017, Pages 248-268
نویسندگان
, , ,