Chinese–Tibetan bilingual clustering based on random walk

Article ID	Journal	Published Year	Pages	File Type
411944	Neurocomputing	2015	10 Pages	PDF

Abstract

In recent years, multi-source clustering has received a significant amount of attention. Several multi-source clustering methods have been developed from different perspectives. In this paper, aiming at addressing the problem of Chinese–Tibetan bilingual document clustering, a novel bilingual clustering scheme is proposed, which can well capture both the intralingua document structures and interlingua document relations. The proposed scheme consists of three major phases. Firstly, to properly combine the feature structures of documents in different languages, a bilingual graph is constructed. In the second phase, two bilingual similarity matrices are computed based on the random walk performed in the bilingual graph. Finally, the similarity based clustering methods are performed on the two bilingual similarity matrices so as to generate cluster structures for documents in each language respectively, which lead to the corresponding bilingual clustering methods. Extensive experiments conducted on two Chinese–Tibetan bilingual document sets have confirmed the effectiveness of the proposed methods.

Keywords

Random walk