Article ID Journal Published Year Pages File Type
489931 Procedia Computer Science 2015 6 Pages PDF
Abstract

The applications running on Hadoop clusters are increasing day by day. This is due to the fact that organizations have found a simple and efficient model that works well in distributed environment. The model is built to work efficiently on thousands of machines and massive data sets using commodity hardware. HDFS and MapReduce is a scalable and fault-tolerant model that hides all the complexities for Big Data analytics. Since Hadoop is becoming increasingly popular, understanding technical details becomes essential. This fact inspired us to explore Hadoop and its components in-depth. The process of analysing, examining and processing huge amount of unstructured data to extract required information has been a challenge. In this paper we discuss Hadoop and its components in detail which comprise of MapReduce and Hadoop Distributed File System (HDFS). MapReduce engine uses JobTracker and TaskTracker that handle monitoring and execution of job. HDFS a distributed file-system which comprise of NameNode, DataNode and Secondary NameNode for efficient handling of distributed storage purpose. The details provided can be used for developing large scale distributed applications that can exploit computational power of multiple nodes for data and compute intensive applications.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)