کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
488506 | 703898 | 2016 | 8 صفحه PDF | دانلود رایگان |
Large amount of data is being generated from sensors, satellites, social media etc. This big data (velocity, variety, veracity, value and veracity) can be processed so as to make timely decisions by the decision makers. This paper presents results of the proposed Hadoop framework that performs entity resolution in Map and reduce phase. MapReduce phase matches two real world objects and generates rules. The similarity score of these rules are used for matching stream data during testing phase. Similarity is calculated using 13 different semantic measures such as token-based similarity, edit-based similarity, hybrid similarity, phonetic similarity as well as domain dependent Natural language processing measures. Semantic measures are implemented using Hive programming. The proposed system is tested using e-catalogues of Amazon and Google.
Journal: Procedia Computer Science - Volume 85, 2016, Pages 550–557