Approximability of constrained LCS

Article ID	Journal	Published Year	Pages	File Type
430722	Journal of Computer and System Sciences	2012	9 Pages	PDF

Abstract

The problem Constrained Longest Common Subsequence is a natural extension to the classical problem Longest Common Subsequence, and has important applications to bioinformatics. Given k input sequences A1,…,AkA1,…,Ak and l constraint sequences B1,…,BlB1,…,Bl, C-LCS(k,l)C-LCS(k,l) is the problem of finding a longest common subsequence of A1,…,AkA1,…,Ak that is also a common supersequence of B1,…,BlB1,…,Bl. Gotthilf et al. gave a polynomial-time algorithm that approximates C-LCS(k,1)C-LCS(k,1) within a factor mˆ|Σ|, where mˆ is the length of the shortest input sequence and |Σ||Σ| is the alphabet size. They asked whether there are better approximation algorithms and whether there exists a lower bound. In this paper, we answer their questions by showing that their approximation factor mˆ|Σ| is in fact already very close to optimal although a small improvement is still possible:1.For any computable function f and any ϵ>0ϵ>0, there is no polynomial-time algorithm that approximates C-LCS(k,1)C-LCS(k,1) within a factor f(|Σ|)⋅mˆ1/2−ϵ unless NP=PNP=P. Moreover, this holds even if the constraint sequence is unary.2.There is a polynomial-time randomized algorithm that approximates C-LCS(k,1)C-LCS(k,1) within a factor |Σ|⋅O(OPT⋅loglogOPT/logOPT) with high probability, where OPT is the length of the optimal solution, OPT⩽mˆ. For the problem over an alphabet of arbitrary size, we show that3.For any ϵ>0ϵ>0, there is no polynomial-time algorithm that approximates C-LCS(k,1)C-LCS(k,1) within a factor mˆ1−ϵ unless NP=PNP=P.4.There is a polynomial-time algorithm that approximates C-LCS(k,1)C-LCS(k,1) within a factor O(mˆ/logmˆ). We also present some complementary results on exact and parameterized algorithms for C-LCS(k,1)C-LCS(k,1).

► We answer an open question of Gotthilf et al. on constrained longest common sequence. ► We show that the ratio of their approximation algorithm is close to optimal. ► We then present a randomized approximation algorithm with an improved ratio. ► Almost tight bounds are also obtained for problem over alphabet of arbitrary size. ► In addition, we present some related results on exact and parameterized algorithms.

Keywords

approximation algorithms Computational biology Longest common subsequence Computational complexity