Semantic Retrieval of Relevant Sources for Large Scale Virtual Documents

Article ID	Journal	Published Year	Pages	File Type
487470	Procedia Computer Science	2015	9 Pages	PDF

Abstract

The term big data has come into use in recent years. It is used to refer to the ever-increasing amount of data that organizations are storing, processing and analyzing. An Interesting fact with bigdata is that it differ in Volume, Variety, Velocity characteristics which makes it difficult to process using the conventional Database Management System. Hence there is a need of schema less Management Systems even this will never be complete solution to bigdata analysis since the processing has no focus on the semantic information as they consider only the structural information. Content Management System like Wikipedia stores and links huge amount of documents and files. There is lack of semantic linking and analysis in such systems even though this kind of CMS uses clusters and distributed framework for storing big data. The retrieved references for a particular article are random and enormous. In order to reduce the number of references for a selected content there is a need for semantic matching. In this paper we propose framework which make use of the distributed parallel processing capability of Hadoop Distributed File System (HDFS) to perform semantic analysis over the volume of documents (bigdata) to find the best matched source document from the collection source documents for the same virtual document.