MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Occurrence of a substring inside a list of strings

  • To: mathgroup at smc.vnet.net
  • Subject: [mg51234] Re: [mg51203] Occurrence of a substring inside a list of strings
  • From: János <janos.lobb at yale.edu>
  • Date: Sat, 9 Oct 2004 04:19:05 -0400 (EDT)
  • References: <200410080654.CAA24974@smc.vnet.net> <F9D1AF3C-1927-11D9-841E-000A95B4967A@mimuw.edu.pl>
  • Sender: owner-wri-mathgroup at wolfram.com

Andrzej,

Elegant is the winner.  On a list with length 294773 and ByteCount  
1626134432 - on a 2GB memory G4 machine top showed just 21MB free  
memory, a typical condition - I got the following results /the first is  
my newbie one/:

In[26]:=
Length[Flatten[Map[Part[StringPosition[#,fragment1],All,1]&,collectedDna 
Bin]]]\
//Timing

Out[26]=
{559.01 Second,12}

In[30]:=
Count[(StringPosition[#1, fragment1, 1] & ) /@collectedDnaBin, _?(#1 !=  
{} & \
)]//Timing

Out[30]=
{558.78 Second,12}

In[31]:=
Length[Join @@ (StringPosition[#1, fragment1, 1] & ) \
/@collectedDnaBin]//Timing

Out[31]=
{557.75 Second,12}

In[33]:=
Count[collectedDnaBin, _?(StringMatchQ[#1, StringJoin["*", fragment1,  
"*"]] & \
)]//Timing

Out[33]=
{114.09 Second,12}


having fragment1 as a 100 character long string.

Thanks a lot,

János


On Oct 8, 2004, at 8:45 AM, Andrzej Kozlowski wrote:

>
> On 8 Oct 2004, at 15:54, János wrote:
>
>> *This message was transferred with a trial version of CommuniGate(tm)  
>> Pro*
>> Hi,
>>
>> I have a simple list of strings like lst={"abc", "abcd",
>> "aabccaddbacdda", "adbacca",....}.  Let's say I have a fragment called
>> frag="dba".  I would like to know how many strings in lst contain
>> minimum once the fragment frag.
>>
>> This is what I did:
>>
>> Length[Flatten[Map[Part[StringPosition[#, frag], All, 1] &,lst] ] ]
>>
>> Is there a better/faster way to calculate it ?
>>
>> Thanks ahead,
>>
>> János
>> ----------------------------------------------
>> Trying to argue with a politician is like lifting up the head of a
>> corpse.
>> (S. Lem: His Master Voice)
>>
>>
>
> I am not sure about "better" or faster (it is quicker to write this  
> sort of program than to test how fast it is). One can produce lots of  
> version using ideas similar to yours, e.g.
>
> Count[(StringPosition[#1, frag, 1] & ) /@ lst, _?(#1 != {} & )]
>
> 2
>
> or
>
> Length[Join @@ (StringPosition[#1, frag, 1] & ) /@ lst]
>
> 2
>
> But the method that seems the most elegant to me (I have not tested it  
> for speed though) is:
>
>
> Count[lst, _?(StringMatchQ[#1, StringJoin["*", frag, "*"]] & )]
>
> 2
>
>
>
> Andrzej Kozlowski
> Chiba, Japan
> http://www.akikoz.net/~andrzej/
> http://www.mimuw.edu.pl/~akoz/
>
>
------------------------------------------
"The shortest route between two points is the middleman"  Ayn Rand


  • Prev by Date: Re: Re: Re: No more memory available
  • Next by Date: Re: Re: Problem with Maximize and conditions.
  • Previous by thread: Re: Occurrence of a substring inside a list of strings
  • Next by thread: Re: Occurrence of a substring inside a list of strings