| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 9651063 | Information Sciences | 2005 | 20 Pages |
Abstract
Current approaches to index weighting for information retrieval from texts are based on statistical analysis of the texts' contents. A key shortcoming of these indexing schemes, which consider only the terms in a document, is that they cannot extract semantically exact indexes that represent the semantic content of a document. To address this issue, we proposed a new indexing formalism that considers not only the terms in a document, but also the concepts. In the proposed method, concepts are extracted by exploiting clusters of terms that are semantically related, referred to as concept clusters. Through experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents. Moreover, the index term dimension was 53.3% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Bo-Yeong Kang, Dae-Won Kim, Sang-Jo Lee,
