Hadoop, MapReduce and HDFS: A Developers Perspective

Article ID	Journal	Published Year	Pages	File Type
489931	Procedia Computer Science	2015	6 Pages	PDF

Abstract

The applications running on Hadoop clusters are increasing day by day. This is due to the fact that organizations have found a simple and efficient model that works well in distributed environment. The model is built to work efficiently on thousands of machines and massive data sets using commodity hardware. HDFS and MapReduce is a scalable and fault-tolerant model that hides all the complexities for Big Data analytics. Since Hadoop is becoming increasingly popular, understanding technical details becomes essential. This fact inspired us to explore Hadoop and its components in-depth. The process of analysing, examining and processing huge amount of unstructured data to extract required information has been a challenge. In this paper we discuss Hadoop and its components in detail which comprise of MapReduce and Hadoop Distributed File System (HDFS). MapReduce engine uses JobTracker and TaskTracker that handle monitoring and execution of job. HDFS a distributed file-system which comprise of NameNode, DataNode and Secondary NameNode for efficient handling of distributed storage purpose. The details provided can be used for developing large scale distributed applications that can exploit computational power of multiple nodes for data and compute intensive applications.