Article ID Journal Published Year Pages File Type
515459 Information Processing & Management 2011 14 Pages PDF
Abstract

Author disambiguation resolves same-name author occurrences in the bibliographic data into namesakes. This enables author-centered searches and high-quality social network analysis. As an attempt to promote much research in author disambiguation, KISTI have constructed a new large-scale test set for this field. This article describes its semi-manual creation procedures, characteristics especially in terms of author ambiguities and name diversities. In addition, the baseline performance of author clustering against the test set is provided.

Research highlights► In order to overcome the weaknesses of the previous test sets and to foster much research in the area of author disambiguation, the construction of a new large-scale test set was attempted. ► Among 6-stage test set construction procedures, Step-4 has contributed in reducing the construction time by automatically acquiring Web evidences to resolve name occurrences to persons. ► The new test set shows more diversities in author ambiguity, sizes of same-name groups, and non-English names than previous test sets. ► Experiments on the new test set indicates that the complexity of the author resolution problem is relatively more dependent on author ambiguity than the size for the same-name author instances.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , , ,