Re: Protein Sequence Alignment efficiency
- To: mathgroup at smc.vnet.net
- Subject: [mg118885] Re: Protein Sequence Alignment efficiency
- From: Armand Tamzarian <mike.honeychurch at gmail.com>
- Date: Sat, 14 May 2011 03:08:01 -0400 (EDT)
- References: <iqj14b$rg8$1@smc.vnet.net>
On May 13, 8:28 pm, Matteo Pendleton <znfin... at gmail.com> wrote: > I'm trying to do some bioinformatics work in Mathematica and I've run up > against a bit of a roadblock regarding code efficiency. I'm doing pairwise > protein sequence alignments and I've written a nice little function that > takes two sequences and returns the optimal alignment. The trouble is that > it's slow. The reason > it's slow is because changing the scoring table to the proper "BLOSUM80" > slows the operation down horribly. > > Assuming: > > seqa = > "QVQLVQSGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARDLETTVVTIYFDYWGQGTLVTVSS"; > seqb = > "QVQLVQSGAEVKKPGASVKVSCKASGYTFTGYYMHWVRQAPGQGLEWMGRINPNSGGTNYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVVYYCARDLRRFGGVPYYFDYWGQGTLVTVSS" > > While: > Timing[SequenceAlignment[seqa , seqb , MergeDifferences -> False];] > > Out[1]={0., Null} > > Changing the scoring table results in this: > > Timing[SequenceAlignment[ seqa , seqb , SimilarityRules -> "BLOSUM80" , > MergeDifferences -> False];] > > Out[1]={0.171, Null} > > ..and my two strings are very similar. Is there any way to optimize the > SequenceAlignment function so that it doesn't do this or would it be better > to create a specialized alignment function based on the underlying linear > programming so that the scoring table is built in? I'd like to be able to > run millions of sequences through this function and that's not going to be > practical if I can only do 5/sec. > > Thanks in advance! Out of curiosity what led you to try to use Mathematica for this task? Mike