Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4970332 | Pattern Recognition Letters | 2017 | 6 Pages |
â¢A definition of kernel functions through Frobenius inner product for spectral clustering.â¢Application of a similarity measure between matrix representations of sentences.â¢Clustering of sentences by a graph representation of the matrices.â¢Vector representation of high level linguistic elements.
Using a simple vector in Rn is a traditional way of representing documents in vector spaces. However, this representation tends to ignore the discourse and syntactic structure of texts. A matrix representation such as the one offered by the Doc2Vec word embedding method preserves these characteristics. In order to integrate a sentence level matrix representing documents to a clustering algorithm, we use a Frobenius based inner product that allows defining kernel functions for spectral clustering. We show that this methodology provides advantages over traditional clustering algorithms and performs better than bag of words (BoW) representations used in Information Retrieval (IR).