کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
9651063 666434 2005 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Exploiting concept clusters for content-based information retrieval
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Exploiting concept clusters for content-based information retrieval
چکیده انگلیسی
Current approaches to index weighting for information retrieval from texts are based on statistical analysis of the texts' contents. A key shortcoming of these indexing schemes, which consider only the terms in a document, is that they cannot extract semantically exact indexes that represent the semantic content of a document. To address this issue, we proposed a new indexing formalism that considers not only the terms in a document, but also the concepts. In the proposed method, concepts are extracted by exploiting clusters of terms that are semantically related, referred to as concept clusters. Through experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents. Moreover, the index term dimension was 53.3% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 170, Issues 2–4, 25 February 2005, Pages 443-462
نویسندگان
, , ,