MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Trying to eliminate a loop

  • To: mathgroup at smc.vnet.net
  • Subject: [mg50595] Trying to eliminate a loop
  • From: János <janos.lobb at yale.edu>
  • Date: Sat, 11 Sep 2004 06:45:09 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

Hi,

I have a four letter alphabet:
In[3]:=
baseAlphabet={a,t,c,g}

I can create an arbitrary length list from it with:

generateRandomStrand[alphabet_List, len_Integer] := \
Table[alphabet[[Random[Integer, {1, Length[alphabet]}] ]], {len}];

for example:

ls = generateRandomStrand[baseAlphabet, 23460];

I want to know what kind of primerLength=7  sub-strands are in it, so I 
partition it:

lspr = Partition[ls, primerLength, 1];

All the possible sub-strands form a set:

primerSet=Distribute[Table[baseAlphabet, {primerLength}], List];

<< Statistics`DataManipulation`
freq=Frequencies[lspr];
gives me a frequency distribution, that is what elements of primerSet 
occur at what frequency in lspr:

To know which sublist occurs the most I do:
In[64]:=
pr=Flatten[First[Extract[freq[[All, 2]], Split[Position[freq, 
Max[freq[[All,
     1]]]][[All, 1]]  ]]  ]]

Out[64]=
{a,a,c,t,g,c,g}

and it is at positions

In[65]:=
prpos=Position[lspr,pr]

Out[65]=
{{2860},{4336},{6791},{11387},{12164},{17472},{17833},{17954}}

in lspr and in ls.

At those positions in ls I want to attach the complement of this pr 
sublist, so I create the following rules:

complementRule = {a -> t, c -> g, t -> a, g -> c};
replaceWithComplement = {a -> {a, t}, c -> {c, g}, t -> {t, a}, g -> 
{g, c}}

I created a For loop which at prpos replaces the elements there with 
the double elements indicated by the rule above on primerLength 
intervals of ls:

For[i = 1, i ² Length[pr], i++, ls = ReplacePart[ls,
     replaceWithComplement, prpos + i - 1, \
Flatten[Position[replaceWithComplement, {Part[Extract[ls,
         prpos + i - 1], 1], Part[Extract[
         ls, prpos + i - 1], 1] /. complementRule}]]] ]

Mathematica does this For loop in about 0.046678 Second on my machine.  
With primerLength=9 it is 0.050748 Second - pr had just three positions 
on the strand.

I have the feeling that it can be done faster with Map or MapAt, so the 
For loop could go away.  I also do not like that I have to rewrite ls 
in every cycle.  ReplacePart does not look good to me in this 
situation, but I have not find yet the way to apply the 
replaceWithComplement  rule directly to the primerLength long intervals 
of ls at prpos positions.

Any good tip ?

Thanks ahead,

J?nos


----------------------------------------------
Trying to argue with a politician is like lifting up the head of a 
corpse.
(S. Lem: His Master Voice)


  • Prev by Date: Re: ColorFunctions again (making z=0 be different from z=1)
  • Next by Date: Re: Voronoi Volume calculation
  • Previous by thread: Re: How do i make the plots show all of the axes?
  • Next by thread: Re: Trying to eliminate a loop