Tuesday, 15 March 2011

Algorithm for global multiple sequence alignment using only indels -



Algorithm for global multiple sequence alignment using only indels -

i'm writing sublime text script align several lines of code. script takes each line, splits predefined set of delimiters (,;:=), , rejoins each segment in 'column' padded same width. works when lines have same set of delimiters, lines may have segments, optional comma @ end, , forth.

my thought come canonical list of delimiters. specifically, given several strings of delimiters, find shortest string can formed of given strings using insertions, ties broken in sensible manner. after research, learned well-known problem of global multiple sequence alignment, except there no mismatches, matches , indels.

the dynamic programming approach, unfortunately, exponential in number of strings - @ to the lowest degree in general case. there hope faster solution when mismatches disallowed?

i'm little hesitant create blanket statement there no such hope, when mismatches disallowed, i'm pretty sure there isn't. here's why.

the size of dynamic programming table generated when doing sequence alignment approximately (string length)^(number of strings), hence exponential run-time/space requirement. give sense of comes from, here's illustration 2 strings, abc , acb, each of length 3. gives 3x3 table:

b c 0 1 2 c 1 1 1 b 2 1 2

we initialize table starting upper left , working our way downwards lower right there. total cost location in table given number @ location (for simplicity, i'm assuming insertions, deletions, , substitutions have cost of 1). operation used given location given direction moved previous value. moving right means inserting elements top string. moving downwards inserts elements sideways string. moving diagonally means aligning elements top , bottom. if these elements don't match, represents substitution , increment cost there.

and that's problem. saying mismatches aren't allowed doesn't rule out operations responsible length , height of table (insertions/deletions). worse, disallowing mismatches doesn't rule out potential move. diagonal movements in table still possible sometimes, not when 2 elements don't match. plus, still need check see if elements match, you're still considering move. result, shouldn't able improve worst case time , seems unlikely have substantial effect on average or best case time either.

on bright side, pretty of import problem in bioinformatics, people have come solutions. have flaws, may work well-enough case (particularly since seems you'll less have spurious alignments dna, given strings not-composed of four-letter alphabet). take @ star alignment , neighbour joining.

algorithm sequence-alignment

No comments:

Post a Comment