The method of NN-grams in large-scale clustering of DNA texts

Article ID	Journal	Published Year	Pages	File Type
532993	Pattern Recognition	2005	11 Pages	PDF

Abstract

This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.

Keywords

Clustering

Related Topics

Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition

Preview

The method of NN-grams in large-scale clustering of DNA texts

Authors

Z. Volkovich, V. Kirzhner, A. Bolshoy, E. Nevo, A. Korol,