کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10321193 659208 2015 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A fuzzy document clustering approach based on domain-specified ontology
ترجمه فارسی عنوان
رویکرد خوشه بندی سند فازی بر اساس هستی شناسی مشخص شده توسط دامنه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Document clustering techniques include automatic document organization, topic extraction, fast information retrieval or filtering, etc. Numerous methods have been developed for document clustering research. Despite the advances achieved, however, document clustering still presents certain challenges such as optimizing feature selection for low-dimensional document representation and incorporating mutual information between the documents into a clustering algorithm. This paper mainly focuses on these two questions. First, we construct a domain-specific ontology that provides the controlled vocabulary describing the hazards related to dairy products. Synonyms of the controlled vocabulary in document set are considered to be relatively prevalent and fundamentally important for feature selection. Second, in combination with the vector space model (VSM), we perform singular value decomposition (SVD) to translate all of the term-document vectors into a concept space. We then obtain the mutual information between documents by calculating the similarity of every two document vectors in the orthogonal matrix of right singular vectors. As the mutual information matrix is also a fuzzy compatible relation, a fuzzy equivalence can be derived by calculating max-min transitive closure. Finally, based on the fuzzy equivalence relation, all of the data sequences are easily allocated into clusters under the guidance of a cluster validation index. Our method both reduces the dimensionality of the original data and considers the correlation between the terms. The experimental results show that encoding the ontologies in the aggregation process could provide better clustering results. Moreover, the proposed work has been applied to food safety supervision which is beneficial for government and society.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 100, Part A, November 2015, Pages 148-166
نویسندگان
, , , , ,