کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6855432 | 1437641 | 2016 | 16 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
A multiobjective optimization based entity matching technique for bibliographic databases
ترجمه فارسی عنوان
یک روش منطبق بر سازه مبتنی بر بهینه سازی چندگانه برای پایگاه داده های کتابشناسی
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
پایگاه داده کتابشناسی، تطابق سازنده، بهینه سازی چند منظوره، الگوریتم ژنتیک، نخبگان، راه حل های بهینه پارتو،
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
هوش مصنوعی
چکیده انگلیسی
With the increasing use of on-line resources, the size of the bibliographic database is growing day by day. The available huge amount of data belong to various entities. It is difficult to automatically identify the records which belong to a particular entity. Mapping the records to the corresponding entity is termed as the entity matching problem. In bibliographic database many attributes change over time. For example - affiliation of an author changes frequently. Many authors generally use different email-ids. The names of co-authors also change with time. All these aspects have made the entity matching problem challenging. Generally an entity matching task is carried out by constructing a feature vector to represent a record, then a classifier is trained to classify each feature vector. But for bibliographic database it is very difficult and time consuming to generate some manually annotated labeled data to train a classifier. Inspired by this observation, we have proposed an unsupervised approach for entity matching problem using non-dominated sorting genetic algorithm-II (NSGA-II). A new encoding strategy is used to encode the clusters in the form of a chromosome. New mutation and crossover operators are proposed which are suitable for bibliographic data clustering. Different distance measures are used to measure the dissimilarities between records. Finally, solutions are evolved using the search capability of NSGA-II. Experimental evaluations are carried out with 247 different combinations of eight objective functions for eight different bibliographic datasets. A comparative analysis with two existing systems - DBLP and ArnetMiner, shows that the proposed technique can produce better results in many cases.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 65, 15 December 2016, Pages 100-115
Journal: Expert Systems with Applications - Volume 65, 15 December 2016, Pages 100-115
نویسندگان
Sumit Mishra, Sriparna Saha, Samrat Mondal,