کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
394696 | 665831 | 2011 | 17 صفحه PDF | دانلود رایگان |

Concept hierarchies, such as the ACM Computing Classification Scheme and InterPro Protein Sequence Classification, are widely used in categorization and indexing applications. In the Internet and Web 2.0 era, new concepts and terms are emerging on an almost daily basis, so it is essential that such hierarchies maintain up-to-date records of concepts. This paper proposes a mechanism to identify the most suitable position to insert new terms into an existing concept hierarchy. The problem is challenging because there are hundreds or even thousands of candidate positions for insertion. Furthermore, usually there is no training instance available for an insertion; nor is it practical to assume the availability of a detailed description of the target concept, except in the hierarchy itself. To resolve the problem, we exploit the topology, content and social information, and apply a learning approach to identify the underlying construction criteria of the concept hierarchy. We utilize three metrics (namely, accuracy, taxonomic closeness, and ranking) to evaluate the proposed learning-based approach on the ACM CCS, the DOAJ and the InterPro datasets to evaluate the proposed learning-based approach. The results demonstrate that, in all three metrics, our approach outperforms similarity-based approaches, such as the Normalized Google Distance, by a significant margin. Finally, we propose a level-based recommendation scheme as a novel application of our system. The source code, dataset, and other related resources are available at http://www.csie.ntu.edu.tw/~d97944007/refinement/.
Journal: Information Sciences - Volume 181, Issue 12, 15 June 2011, Pages 2512–2528