کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
488571 703913 2016 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Finding Similar Documents Using Different Clustering Techniques
ترجمه فارسی عنوان
یافتن اسناد مشابه با استفاده از روشهای مختلف خوشه بندی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

Text clustering is an important application of data mining. It is concerned with grouping similar text documents together. In this paper, several models are built to cluster capstone project documents using three clustering techniques: k-means, k-means fast, and k-medoids. Our datatset is obtained from the library of the College of Computer and Information Sciences, King Saud University, Riyadh. Three similarity measure are tested: cosine similarity, Jaccard similarity, and Correlation Coefficient. The quality of the obtained models is evaluated and compared. The results indicate that the best performance is achieved using k-means and k-medoids combined with cosine similarity. We observe variation in the quality of clustering based on the evaluation measure used. In addition, as the value of k increases, the quality of the resulting cluster improves. Finally, we reveal the categories of graduation projects offered in the Information Technology department for female students.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 82, 2016, Pages 28–34
نویسندگان
, , ,