Re: Protein Sequence Alignment efficiency
- To: mathgroup at smc.vnet.net
- Subject: [mg118885] Re: Protein Sequence Alignment efficiency
- From: Armand Tamzarian <mike.honeychurch at gmail.com>
- Date: Sat, 14 May 2011 03:08:01 -0400 (EDT)
- References: <iqj14b$rg8$1@smc.vnet.net>
On May 13, 8:28 pm, Matteo Pendleton <znfin... at gmail.com> wrote:
> I'm trying to do some bioinformatics work in Mathematica and I've run up
> against a bit of a roadblock regarding code efficiency. I'm doing pairwise
> protein sequence alignments and I've written a nice little function that
> takes two sequences and returns the optimal alignment. The trouble is that
> it's slow. The reason
> it's slow is because changing the scoring table to the proper "BLOSUM80"
> slows the operation down horribly.
>
> Assuming:
>
> seqa =
> "QVQLVQSGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARDLETTVVTIYFDYWGQGTLVTVSS";
> seqb =
> "QVQLVQSGAEVKKPGASVKVSCKASGYTFTGYYMHWVRQAPGQGLEWMGRINPNSGGTNYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVVYYCARDLRRFGGVPYYFDYWGQGTLVTVSS"
>
> While:
> Timing[SequenceAlignment[seqa , seqb , MergeDifferences -> False];]
>
> Out[1]={0., Null}
>
> Changing the scoring table results in this:
>
> Timing[SequenceAlignment[ seqa , seqb , SimilarityRules -> "BLOSUM80" ,
> MergeDifferences -> False];]
>
> Out[1]={0.171, Null}
>
> ..and my two strings are very similar. Is there any way to optimize the
> SequenceAlignment function so that it doesn't do this or would it be better
> to create a specialized alignment function based on the underlying linear
> programming so that the scoring table is built in? I'd like to be able to
> run millions of sequences through this function and that's not going to be
> practical if I can only do 5/sec.
>
> Thanks in advance!
Out of curiosity what led you to try to use Mathematica for this task?
Mike