کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
459294 696239 2015 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Entity resolution based EM for integrating heterogeneous distributed probabilistic data
ترجمه فارسی عنوان
برای حل یکپارچه داده های احتمال احتمالی توزیع نشده، تصمیم گیری بر اساس الگوریتم صورت گرفته است
کلمات کلیدی
پایگاه داده ناهمگن، داده های احتمالی، رزومه شخصیت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
چکیده انگلیسی


• First, we expose the instance of ER with EM algorithm to integrate different models.
• We apply ER for HDPD to achieve major performance in terms of response time.
• Matching instance is compared with BFA, Koosh-TND and Packages-TND algorithms.
• After the integration, we work on real data and compare with FC, SCC and FD algorithms.
• Our method improves in terms of response and communication time with existing methods.

Distributed computing is linked and equated to the industrial revolution. Its transformational nature is, however, associated with significant instances in the form of internet of thing operations. Entity resolution (ER) is a problem of matching and resolving records that represent the same real world entity. This is a long-standing challenge in distributed databases and information retrieval as a statistic. In a centralized approach, the problem of ER has not been scaled well as large amount of data need to be sent to a central node. In this paper, we present an algorithm which deals with heterogeneous distributed probabilistic data (HDPD) and also reduces processing time in a distributed environment. We propose two different approaches. First, we explore this instance with a matching (identification) problem to integrate different data models with expectation–maximization (EM) algorithm. Second, we apply ER methodology for HDPD to achieve major performance in terms of response time to produce the outcome. We validate HDPD through experiments over a 100-node cluster that records significant performance improvements over naive approaches. This paper is expected to provide insights in to database organizations and new technological development for the growth of distributed environment.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 107, September 2015, Pages 93–109
نویسندگان
, ,