کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4944364 1437984 2017 36 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data
ترجمه فارسی عنوان
رویکرد توزیع شده به نسل تکامل چند منظوره از طبقه بندی های مبتنی بر قاعده فازی از داده های بزرگ
کلمات کلیدی
سیستم های فازی تکاملی چند هدفه، اطلاعات بزرگ، طبقه بندی های مبتنی بر قاعده فازی، آپاچی جرقه،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
In the last years, multi-objective evolutionary algorithms (MOEAs) have been extensively used to generate sets of fuzzy rule-based classifiers (FRBCs) with different trade-offs between accuracy and interpretability. Since the computation of the accuracy for each chromosome evaluation requires the scan of the overall training set, these approaches have proved to be very expensive in terms of execution time and memory occupation. For this reason, they have not been applied to very large datasets yet. On the other hand, just for these datasets, interpretability of classifiers would be very desirable. In the last years the advent of a number of open source cluster computing frameworks has however opened new interesting perspectives. In this paper, we exploit one of these frameworks, namely Apache Spark, and propose the first distributed multi-objective evolutionary approach to learn concurrently the rule and data bases of FRBCs by maximizing accuracy and minimizing complexity. During the evolutionary process, the computation of the fitness is divided among the cluster nodes, thus allowing the designer to distribute both the computational complexity and the dataset storing. We have performed a number of experiments on ten real-world big datasets, evaluating our distributed approach in terms of both classification rate and scalability, and comparing it with two well-known state-of-art distributed classifiers. Finally, we have evaluated the achievable speedup on a small computer cluster. We present that the distributed version can efficiently extract compact rule bases with high accuracy, preserving the interpretability of the rule base, and can manage big datasets even with modest hardware support.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 415–416, November 2017, Pages 319-340
نویسندگان
, , , , ,