Article ID Journal Published Year Pages File Type
411944 Neurocomputing 2015 10 Pages PDF
Abstract

In recent years, multi-source clustering has received a significant amount of attention. Several multi-source clustering methods have been developed from different perspectives. In this paper, aiming at addressing the problem of Chinese–Tibetan bilingual document clustering, a novel bilingual clustering scheme is proposed, which can well capture both the intralingua document structures and interlingua document relations. The proposed scheme consists of three major phases. Firstly, to properly combine the feature structures of documents in different languages, a bilingual graph is constructed. In the second phase, two bilingual similarity matrices are computed based on the random walk performed in the bilingual graph. Finally, the similarity based clustering methods are performed on the two bilingual similarity matrices so as to generate cluster structures for documents in each language respectively, which lead to the corresponding bilingual clustering methods. Extensive experiments conducted on two Chinese–Tibetan bilingual document sets have confirmed the effectiveness of the proposed methods.

Keywords
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,