Re: Occurrence of a substring inside a list of strings
- To: mathgroup at smc.vnet.net
- Subject: [mg51234] Re: [mg51203] Occurrence of a substring inside a list of strings
- From: János <janos.lobb at yale.edu>
- Date: Sat, 9 Oct 2004 04:19:05 -0400 (EDT)
- References: <200410080654.CAA24974@smc.vnet.net> <F9D1AF3C-1927-11D9-841E-000A95B4967A@mimuw.edu.pl>
- Sender: owner-wri-mathgroup at wolfram.com
Andrzej,
Elegant is the winner. On a list with length 294773 and ByteCount
1626134432 - on a 2GB memory G4 machine top showed just 21MB free
memory, a typical condition - I got the following results /the first is
my newbie one/:
In[26]:=
Length[Flatten[Map[Part[StringPosition[#,fragment1],All,1]&,collectedDna
Bin]]]\
//Timing
Out[26]=
{559.01 Second,12}
In[30]:=
Count[(StringPosition[#1, fragment1, 1] & ) /@collectedDnaBin, _?(#1 !=
{} & \
)]//Timing
Out[30]=
{558.78 Second,12}
In[31]:=
Length[Join @@ (StringPosition[#1, fragment1, 1] & ) \
/@collectedDnaBin]//Timing
Out[31]=
{557.75 Second,12}
In[33]:=
Count[collectedDnaBin, _?(StringMatchQ[#1, StringJoin["*", fragment1,
"*"]] & \
)]//Timing
Out[33]=
{114.09 Second,12}
having fragment1 as a 100 character long string.
Thanks a lot,
János
On Oct 8, 2004, at 8:45 AM, Andrzej Kozlowski wrote:
>
> On 8 Oct 2004, at 15:54, János wrote:
>
>> *This message was transferred with a trial version of CommuniGate(tm)
>> Pro*
>> Hi,
>>
>> I have a simple list of strings like lst={"abc", "abcd",
>> "aabccaddbacdda", "adbacca",....}. Let's say I have a fragment called
>> frag="dba". I would like to know how many strings in lst contain
>> minimum once the fragment frag.
>>
>> This is what I did:
>>
>> Length[Flatten[Map[Part[StringPosition[#, frag], All, 1] &,lst] ] ]
>>
>> Is there a better/faster way to calculate it ?
>>
>> Thanks ahead,
>>
>> János
>> ----------------------------------------------
>> Trying to argue with a politician is like lifting up the head of a
>> corpse.
>> (S. Lem: His Master Voice)
>>
>>
>
> I am not sure about "better" or faster (it is quicker to write this
> sort of program than to test how fast it is). One can produce lots of
> version using ideas similar to yours, e.g.
>
> Count[(StringPosition[#1, frag, 1] & ) /@ lst, _?(#1 != {} & )]
>
> 2
>
> or
>
> Length[Join @@ (StringPosition[#1, frag, 1] & ) /@ lst]
>
> 2
>
> But the method that seems the most elegant to me (I have not tested it
> for speed though) is:
>
>
> Count[lst, _?(StringMatchQ[#1, StringJoin["*", frag, "*"]] & )]
>
> 2
>
>
>
> Andrzej Kozlowski
> Chiba, Japan
> http://www.akikoz.net/~andrzej/
> http://www.mimuw.edu.pl/~akoz/
>
>
------------------------------------------
"The shortest route between two points is the middleman" Ayn Rand
- References:
- Occurrence of a substring inside a list of strings
- From: János <janos.lobb@yale.edu>
- Occurrence of a substring inside a list of strings