Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

Article ID	Journal	Published Year	Pages	File Type
4969084	Information Fusion	2018	30 Pages	PDF

Abstract

In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.

Keywords

Big data analytics Spark Information fusion MapReduce Machine learning