Article ID Journal Published Year Pages File Type
523341 Journal of Informetrics 2007 12 Pages PDF
Abstract

Never before in history has mankind produced and had access to so much data, information, knowledge, and expertise as today. To organize, access, and manage these valuable assets effectively, we use taxonomies, classification hierarchies, ontologies, controlled vocabularies, and other approaches. We create directory structures for our files. We use organizational hierarchies to structure our work environment. However, the design and continuous update of these organizational schemas with potentially thousands of class nodes organizing millions of entities is challenging for any human being.The taxonomy visualization and validation (TV) tool introduced in this paper supports the semi-automatic validation and optimization of organizational schemas such as file directories, classification hierarchies, taxonomies, or other structures imposed on a data set for organization, access, and naming. By showing the “goodness of fit” for a schema and the potentially millions of entities it organizes, the TV tool eases the identification and reclassification of misclassified information entities, the identification of classes that grow too large, the evaluation of the size and homogeneity of existing classes, the examination of the “well-formedness” of an organizational schema, and more. As a demonstration, the TV tool is applied to display and examine the United States Patent and Trademark Office patent classification, which organizes more than three million patents into about 160,000 distinct patent classes. The paper concludes with a discussion and an outlook to future work.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , , ,