کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6937848 1449889 2019 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An incremental graph-partitioning algorithm for entity resolution
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
An incremental graph-partitioning algorithm for entity resolution
چکیده انگلیسی
Entity resolution is an important data association task when fusing information from multiple sources. Oftentimes the information arrives continuously and the entity resolution algorithm needs to efficiently update its solution upon receiving new information. In this work, we introduce an incremental entity resolution algorithm based on a graph partitioning formulation. The developed algorithm is able to handle both incrementally arriving entity references, as well as incrementally arriving information which changes the pairwise similarity scores between the references. New information is handled in a way that allows the algorithm to reconsider past decisions when contradicting information arrives. Because the graph partitioning formulation used is NP-Hard, a heuristic algorithm is developed to produce good solutions, which is also compatible with a blocking technique to limit the number of required comparisons. The algorithm is tested on a variety of datasets (randomly generated and real) and it is shown that allowing the algorithm to consider revised scores and revisit prior decisions offers a substantial improvement to accuracy (approximately 30-40% better F-Score on a natural language dataset), compared to other greedy heuristics on the same set of coefficients. It is also shown that, on a test set with 100 references, the incremental algorithm is up to an order of magnitude faster than a batch algorithm approach that re-solves the entire problem.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Fusion - Volume 46, March 2019, Pages 171-183
نویسندگان
, , , ,