Article ID Journal Published Year Pages File Type
459294 Journal of Systems and Software 2015 17 Pages PDF
Abstract

•First, we expose the instance of ER with EM algorithm to integrate different models.•We apply ER for HDPD to achieve major performance in terms of response time.•Matching instance is compared with BFA, Koosh-TND and Packages-TND algorithms.•After the integration, we work on real data and compare with FC, SCC and FD algorithms.•Our method improves in terms of response and communication time with existing methods.

Distributed computing is linked and equated to the industrial revolution. Its transformational nature is, however, associated with significant instances in the form of internet of thing operations. Entity resolution (ER) is a problem of matching and resolving records that represent the same real world entity. This is a long-standing challenge in distributed databases and information retrieval as a statistic. In a centralized approach, the problem of ER has not been scaled well as large amount of data need to be sent to a central node. In this paper, we present an algorithm which deals with heterogeneous distributed probabilistic data (HDPD) and also reduces processing time in a distributed environment. We propose two different approaches. First, we explore this instance with a matching (identification) problem to integrate different data models with expectation–maximization (EM) algorithm. Second, we apply ER methodology for HDPD to achieve major performance in terms of response time to produce the outcome. We validate HDPD through experiments over a 100-node cluster that records significant performance improvements over naive approaches. This paper is expected to provide insights in to database organizations and new technological development for the growth of distributed environment.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,