کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396101 666204 2007 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On principal component analysis, cosine and Euclidean measures in information retrieval
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
On principal component analysis, cosine and Euclidean measures in information retrieval
چکیده انگلیسی

Clustering groups document objects represented as vectors. An extensive vector space may cause obstacles to applying these methods. Therefore, the vector space was reduced with principal component analysis (PCA). The conventional cosine measure is not the only choice with PCA, which involves the mean-correction of data. Since mean-correction changes the location of the origin, the angles between the document vectors also change. To avoid this, we used a connection between the cosine measure and the Euclidean distance in association with PCA, and grounded searching on the latter. We applied the single and complete linkage and Ward clustering to Finnish documents utilizing their relevance assessment as a new feature. After the normalization of the data PCA was run and relevant documents were clustered.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 177, Issue 22, 15 November 2007, Pages 4893–4905
نویسندگان
, , ,