Article ID Journal Published Year Pages File Type
380868 Engineering Applications of Artificial Intelligence 2012 11 Pages PDF
Abstract

The Shortest Common Supersequence Problem asks to obtain a shortest string that is a supersequence of every member of a given set of strings. It has applications, among others, in data compression and oligonucleotide microarray production. The problem is NP-hard, and the existing exact solutions are impractical for large instances. In this paper, a new beam search algorithm is proposed for the problem, which employs a probabilistic heuristic and uses the dominance property to further prune the search space. The proposed algorithm is compared with three recent algorithms proposed for the problem on both random and biological sequences, outperforming them all by quickly providing solutions of higher average quality in all the experimental cases. The Java source and binary files of the proposed IBS_SCS algorithm and our implementation of the DR algorithm and all the random and real datasets used in this paper are freely available upon request.

► Microarrays are precious tools successfully used in many biological applications. ► A shorter deposition sequence reduces microarray production cost, time, and error. ► The deposition sequence is a common supersequence of the underlying probes. ► IBS_SCS achieves shorter common supersequences than the other algorithms. ► IBS_SCS is also remarkably fast. The resulting software will be freely available.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,