A fast algorithm for the constrained multiple sequence alignment problem

Given n strings S1, S2, ..., Sn, and a pattern string P, the constrained multiple sequence alignment (CMSA) problem is to find an optimal multiple alignment of S1, S2, ..., Sn such that the alignment contains P, i.e. in the alignment matrix there exists a sequence of columns each entirely composed o...

Teljes leírás

Elmentve itt :
Bibliográfiai részletek
Szerzők: He Dan
Arslan Abdullah N.
Ling Alan C. H.
Testületi szerző: International Conference on Automata and Formal Languages (11.) (2005) (Dobogókő)
Dokumentumtípus: Cikk
Megjelent: 2006
Sorozat:Acta cybernetica 17 No. 4
Kulcsszavak:Számítástechnika, Kibernetika
Tárgyszavak:
Online Access:http://acta.bibl.u-szeged.hu/12792
LEADER 02807nab a2200253 i 4500
001 acta12792
005 20220615135741.0
008 161015s2006 hu o 0|| eng d
022 |a 0324-721X 
040 |a SZTE Egyetemi Kiadványok Repozitórium  |b hun 
041 |a eng 
100 1 |a He Dan 
245 1 2 |a A fast algorithm for the constrained multiple sequence alignment problem  |h [elektronikus dokumentum] /  |c  He Dan 
260 |c 2006 
300 |a 701-717 
490 0 |a Acta cybernetica  |v 17 No. 4 
520 3 |a Given n strings S1, S2, ..., Sn, and a pattern string P, the constrained multiple sequence alignment (CMSA) problem is to find an optimal multiple alignment of S1, S2, ..., Sn such that the alignment contains P, i.e. in the alignment matrix there exists a sequence of columns each entirely composed of symbol P[k] for every k, where P[k] is the kth symbol in P, 1 ≤ k ≤ |P|, and in the sequence, a column containing P[i] appears before the column containing P[j] for all i,j, i < j. The problem is motivated from the problem of comparing multiple sequences that share a common structure, or sequence pattern. There are O(2ns1s2...snr)-time dynamic programming algorithms for the problem, where s1,s2, ...,sn and r are, respectively, the lengths of the input strings and the pattern string. Feasibility of these algorithms in practice is limited when the number of sequences is large, or the sequences are long because of the impractically long time required by these algorithms. We present a new algorithm with worst-case time complexity also O(2ns1s2...snr), but the algorithm avoids redundant computations in existing dynamic programming solutions. Experiments on both randomly generated strings and real data show that this algorithm is much faster than the existing algorithms. We present an analysis that explains the speed-up obtained in our experiments by our algorithm over the naive dynamic programming algorithm for constrained multiple sequence alignment of protein sequences. The speed-up is more significant when pattern is long, or n is large. For example in the case of constrained pairwise sequence alignment (the CMSA problem with n=2) when the pattern is sufficiently long for strings S1 and S2, the asymptotic time complexity is observed to be O(s1s2) instead of O(s1s2r). Main ideas in our algorithm can also be used in other constrained sequence alignment problems. 
650 4 |a Természettudományok 
650 4 |a Számítás- és információtudomány 
695 |a Számítástechnika, Kibernetika 
700 0 1 |a Arslan Abdullah N.  |e aut 
700 0 1 |a Ling Alan C. H.  |e aut 
710 |a International Conference on Automata and Formal Languages (11.) (2005) (Dobogókő) 
856 4 0 |u http://acta.bibl.u-szeged.hu/12792/1/DanHe_2006_ActaCybernetica.pdf  |z Dokumentum-elérés