Article ID Journal Published Year Pages File Type
4966428 Information Processing & Management 2016 12 Pages PDF
Abstract
Bibliographic collections in traditional libraries often compile records from distributed sources where variable criteria have been applied to the normalization of the data. Furthermore, the source records often follow classical standards, such as MARC21, where a strict normalization of author names is not enforced. The identification of equivalent records in large catalogues is therefore required, for example, when migrating the data to new repositories which apply modern specifications for cataloguing, such as the FRBR and RDA standards. An open-source tool has been implemented to assist authority control in bibliographic catalogues when external features (such as the citations found in scientific articles) are not available for the disambiguation of creator names. This tool is based on similarity measures between the variants of author names combined with a parser which interprets the dates and periods associated with the creator. An efficient data structure (the unigram frequency vector trie) has been used to accelerate the identification of variants. The algorithms employed and the attribute grammar are described in detail and their implementation is distributed as an open-source resource to allow for an easier uptake.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,