Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
532993 | Pattern Recognition | 2005 | 11 Pages |
Abstract
This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Z. Volkovich, V. Kirzhner, A. Bolshoy, E. Nevo, A. Korol,