کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942927 1437615 2018 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
FWCMR: A scalable and robust fuzzy weighted clustering based on MapReduce with application to microarray gene expression
ترجمه فارسی عنوان
FWCMR: یک خوشه بندی فازی وزنی مقیاس پذیر و قوی با استفاده از MapReduce با کاربرد برای بیان ژن میکروآرایه
کلمات کلیدی
MapReduce؛ خوشه بندی بر اساس تراکم توزیع شده ؛ میکروآرایه بیان ژن؛ خوشه بندی وزن فازی؛ تصمیم سازی؛ اطلاعات بزرگ
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


- A fuzzy weighted similarity measurement specially for gene expression clustering.
- A density based soft clustering on the basis of MapReduce.
- Flexible and robust to intrinsic noise with reasonable clustering results.
- Parallelism with least serial bottleneck working with Hadoop, EC2, ….
- A scalable clustering method appropriate for huge volumes of big data.

Data clustering is a very useful data mining technique to find groups of similar objects present in the dataset. Scalability to handle immense volumes, robustness to intrinsic outlier data and validity of clustering results are the main challenges of any data clustering approach. In order to address these challenges, a fuzzy weighted clustering approach which is comprehensibly parallel and distributed in every phase, is proposed in this research. Although the proposed method can be used for various data clustering purposes, it has been applied in gene expression clustering to reveal functional relationships of genes in a biological process. Conforming to MapReduce, the proposed method also presents a novel similarity measure which benefits from combining ordered weighted averaging and Spearman correlation coefficient. In the proposed method, density reachable genes were joined to establish subclusters. Afterwards, final cluster results were obtained by merging these subclusters. A voting system detects the best weights and consequently the most valid clusters among all possible results for each distinct dataset. The whole algorithm is implemented on a distributed processing platform and it is scalable to process any size of data stored in cloud infrastructures.Precision of resulting clusters were evaluated using some of the well-known cluster validity indexes in the literature. Also, the efficiency of the proposed method in scalability and robustness was compared with recently published similar researches. In all the mentioned comparisons, the proposed method outperformed recent works on the same datasets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 91, January 2018, Pages 198-210
نویسندگان
, ,