Re: Occurrence of a substring inside a list of strings

*To*: mathgroup at smc.vnet.net*Subject*: [mg51234] Re: [mg51203] Occurrence of a substring inside a list of strings*From*: János <janos.lobb at yale.edu>*Date*: Sat, 9 Oct 2004 04:19:05 -0400 (EDT)*References*: <200410080654.CAA24974@smc.vnet.net> <F9D1AF3C-1927-11D9-841E-000A95B4967A@mimuw.edu.pl>*Sender*: owner-wri-mathgroup at wolfram.com

Andrzej, Elegant is the winner. On a list with length 294773 and ByteCount 1626134432 - on a 2GB memory G4 machine top showed just 21MB free memory, a typical condition - I got the following results /the first is my newbie one/: In[26]:= Length[Flatten[Map[Part[StringPosition[#,fragment1],All,1]&,collectedDna Bin]]]\ //Timing Out[26]= {559.01 Second,12} In[30]:= Count[(StringPosition[#1, fragment1, 1] & ) /@collectedDnaBin, _?(#1 != {} & \ )]//Timing Out[30]= {558.78 Second,12} In[31]:= Length[Join @@ (StringPosition[#1, fragment1, 1] & ) \ /@collectedDnaBin]//Timing Out[31]= {557.75 Second,12} In[33]:= Count[collectedDnaBin, _?(StringMatchQ[#1, StringJoin["*", fragment1, "*"]] & \ )]//Timing Out[33]= {114.09 Second,12} having fragment1 as a 100 character long string. Thanks a lot, János On Oct 8, 2004, at 8:45 AM, Andrzej Kozlowski wrote: > > On 8 Oct 2004, at 15:54, János wrote: > >> *This message was transferred with a trial version of CommuniGate(tm) >> Pro* >> Hi, >> >> I have a simple list of strings like lst={"abc", "abcd", >> "aabccaddbacdda", "adbacca",....}. Let's say I have a fragment called >> frag="dba". I would like to know how many strings in lst contain >> minimum once the fragment frag. >> >> This is what I did: >> >> Length[Flatten[Map[Part[StringPosition[#, frag], All, 1] &,lst] ] ] >> >> Is there a better/faster way to calculate it ? >> >> Thanks ahead, >> >> János >> ---------------------------------------------- >> Trying to argue with a politician is like lifting up the head of a >> corpse. >> (S. Lem: His Master Voice) >> >> > > I am not sure about "better" or faster (it is quicker to write this > sort of program than to test how fast it is). One can produce lots of > version using ideas similar to yours, e.g. > > Count[(StringPosition[#1, frag, 1] & ) /@ lst, _?(#1 != {} & )] > > 2 > > or > > Length[Join @@ (StringPosition[#1, frag, 1] & ) /@ lst] > > 2 > > But the method that seems the most elegant to me (I have not tested it > for speed though) is: > > > Count[lst, _?(StringMatchQ[#1, StringJoin["*", frag, "*"]] & )] > > 2 > > > > Andrzej Kozlowski > Chiba, Japan > http://www.akikoz.net/~andrzej/ > http://www.mimuw.edu.pl/~akoz/ > > ------------------------------------------ "The shortest route between two points is the middleman" Ayn Rand

**References**:**Occurrence of a substring inside a list of strings***From:*János <janos.lobb@yale.edu>