کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4970332 1450034 2017 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Sentence level matrix representation for document spectral clustering
ترجمه فارسی عنوان
ماتریس سطح حکم برای خوشه بندی طیفی سند
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


- A definition of kernel functions through Frobenius inner product for spectral clustering.
- Application of a similarity measure between matrix representations of sentences.
- Clustering of sentences by a graph representation of the matrices.
- Vector representation of high level linguistic elements.

Using a simple vector in Rn is a traditional way of representing documents in vector spaces. However, this representation tends to ignore the discourse and syntactic structure of texts. A matrix representation such as the one offered by the Doc2Vec word embedding method preserves these characteristics. In order to integrate a sentence level matrix representing documents to a clustering algorithm, we use a Frobenius based inner product that allows defining kernel functions for spectral clustering. We show that this methodology provides advantages over traditional clustering algorithms and performs better than bag of words (BoW) representations used in Information Retrieval (IR).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 85, 1 January 2017, Pages 29-34
نویسندگان
, , ,