کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6855432 1437641 2016 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A multiobjective optimization based entity matching technique for bibliographic databases
ترجمه فارسی عنوان
یک روش منطبق بر سازه مبتنی بر بهینه سازی چندگانه برای پایگاه داده های کتابشناسی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
With the increasing use of on-line resources, the size of the bibliographic database is growing day by day. The available huge amount of data belong to various entities. It is difficult to automatically identify the records which belong to a particular entity. Mapping the records to the corresponding entity is termed as the entity matching problem. In bibliographic database many attributes change over time. For example - affiliation of an author changes frequently. Many authors generally use different email-ids. The names of co-authors also change with time. All these aspects have made the entity matching problem challenging. Generally an entity matching task is carried out by constructing a feature vector to represent a record, then a classifier is trained to classify each feature vector. But for bibliographic database it is very difficult and time consuming to generate some manually annotated labeled data to train a classifier. Inspired by this observation, we have proposed an unsupervised approach for entity matching problem using non-dominated sorting genetic algorithm-II (NSGA-II). A new encoding strategy is used to encode the clusters in the form of a chromosome. New mutation and crossover operators are proposed which are suitable for bibliographic data clustering. Different distance measures are used to measure the dissimilarities between records. Finally, solutions are evolved using the search capability of NSGA-II. Experimental evaluations are carried out with 247 different combinations of eight objective functions for eight different bibliographic datasets. A comparative analysis with two existing systems - DBLP and ArnetMiner, shows that the proposed technique can produce better results in many cases.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 65, 15 December 2016, Pages 100-115
نویسندگان
, , ,