Algorithms for extracting motifs from biological weighted sequences

Article ID	Journal	Published Year	Pages	File Type
431149	Journal of Discrete Algorithms	2007	14 Pages	PDF

Abstract

In this paper we present three algorithms for the Motif Identification Problem in Biological Weighted Sequences. The first algorithm extracts repeated motifs from a biological weighted sequence. The motifs correspond to repetitive words which are approximately equal, under a Hamming distance, with probability of occurrence ⩾1/k⩾1/k, where k is a small constant. The second algorithm extracts common motifs from a set of N⩾2N⩾2 weighted sequences. In this case, the motifs consists of words that must occur with probability ⩾1/k⩾1/k, in 1⩽q⩽N1⩽q⩽N distinct sequences of the set. The third algorithm extracts maximal pairs from a biological weighted sequence. A pair in a sequence is the occurrence of the same word twice. In addition, the algorithms presented in this paper improve previous work on these problems.