کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4946322 1439284 2017 27 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark
چکیده انگلیسی
The challenge of scalability has always been a focus on Machine Learning research, where improved algorithms and new techniques are proposed in a constant basis to deal with more complex problems. With the advent of Big Data, this challenge has been intensified, in which new large scale datasets overwhelm the majority of available techniques. The community has resorted to Cloud Computing and distributed programming paradigms as the most immediate solution where Apache Spark has proven to be the most promising framework. In this paper we focus on the problem of supervised classification, exploring the family of the so called Bayesian Network Classifiers by studying their adaptability to the MapReduce and Apache Spark frameworks. We will analyse a range of algorithms and propose distributed versions of them. Our approach is based on a general framework for learning this probabilistic models from large scale and high dimensional data, the latter being a problem with less support in the literature. We also present an extensive experimental evaluation of our proposal over a wide set of problems and different elastic configurations of a computing cluster to show the full extent of the scalability properties of our framework. Additional material and the software to reproduce our experiments can be found on the supplementary website http://simd.albacete.org/supplements/distributed_bncs.html.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 117, 1 February 2017, Pages 16-26
نویسندگان
, , ,