MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: split again

  • To: mathgroup at smc.vnet.net
  • Subject: [mg73678] Re: [mg73622] split again
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Sat, 24 Feb 2007 02:22:00 -0500 (EST)
  • References: <200702230936.EAA17604@smc.vnet.net>

Arkadiusz.Majka at gmail.com wrote:
> Hi,
> 
> Thank you very much for your help in my provious post. Now, consider
> please a list
> 
> z=Table[Random[],{1000}];
> 
> zSplit=Split[z,#1<=0.7 === #2>=0.7];           (bulit thanks to your
> help)

This is not quite right. See code below.


> I want to pick the first sublist of zSplit that consists of elements
> <= 0.7 and whose length is greater than a certain number (say 5). I
> think that a good candidate would be using Cases and variations of _ ,
> but I don't know how.
> 
> What I want to do (and what my both posts are about) is to find the
> first sublist of z with all elements less than A and the length of
> this sublist must be greater than B. Maybe there exists better
> solution than to Split z in advance since what I need to do in my next
> step is to find ONLY the FIRST sublist of splitted z.
> 
> Thanks again,
> 
> Arek
> 

A straightforward walk of the list, stopping at the first run of however 
many elements, is probably best. At least when I use Compile to do this 
I tend to get good results.

Here is code I wrote based on your description and using Split.

mySplit1[ll_,thresh_,size_] := Module[
   {sll, pos},
   sll = Split[ll, (#1<=thresh=!=#2>=thresh)&];
   pos = Position[sll, a_ /; Length[a]>=size && a[[1]]<=thresh, 1, 1];
   If [Length[pos]==0, {}, Extract[sll,pos]]
   ]

Here is code that uses a walk. I split up the code into two parts. The 
main one uses Compile for speed and returns the start position of the 
first run, or 0 if there is no such.

mySplit2[ll_,thresh_,size_] := With[
   {index=mySplit2First[ll,thresh,size]},
   If [index[[1]]==0, {}, {index[[1]],ll[[Range[index[[1]],index[[2]]]]]}]
   ]

mySplit2First = Compile[{{ll,_Real,1},{thresh,_Real,0},{size,_Integer,0}},
   Module[{n=Length[ll], i=0, k=0, l=0},
     Do [
       If [ll[[j]]<=thresh,
         i++; If [i==size, k=j-size+1]; If [i>=size,l=j];
         , (* else *)
         If [i>=size, Break[], i = 0]];
       , {j,1,n}];
     {k,l}
     ]]

On longish lists there is a very clear speed difference.
In[115]:= n = 10^6;

In[116]:= z = Table[Random[],{n}];

In[117]:= Timing[mySplit1[z, .7, 30]]
Out[117]= {2.88818, {{0.24682, 0.418994, 0.276445, 0.297828, 0.309867, 
    0.0504006, 0.565136, 0.402471, 0.214515, 0.695627, 0.452592, 
0.619066,    0.615646, 0.669529, 0.0768277, 0.63955, 0.254386, 0.294692, 
0.298738,     0.407727, 0.618329, 0.643527, 0.35957, 0.540794, 0.37151, 
0.224532,    0.0831248, 0.242966, 0.0616422, 0.174132, 0.517989}}}

In[131]:= Timing[mySplit2[z, .7, 30]]
Out[131]= {0.008, {22584,
   {0.24682, 0.418994, 0.276445, 0.297828, 0.309867,
  0.0504006, 0.565136, 0.402471, 0.214515, 0.695627, 0.452592, 0.619066,
  0.615646, 0.669529, 0.0768277, 0.63955, 0.254386, 0.294692, 0.298738,
  0.407727, 0.618329, 0.643527, 0.35957, 0.540794, 0.37151, 0.224532,
  0.0831248, 0.242966, 0.0616422, 0.174132, 0.517989}}}

So we found a run beginning around 2.25 % into the list. Suppose we 
actually had to walk the entire list? Would we see much difference then? 
We see below that there is no run of length 40, and moreover the second 
method is still substantially faster.

In[119]:= Timing[mySplit1[z, .7, 40]]
Out[119]= {3.47222, {}}

In[120]:= Timing[mySplit2[z, .7, 40]]
Out[120]= {0.292018, {}}

I take the moral to be that one should not abandon all those procedural 
programming methods that were in vogue 25 years ago. Probably just a 
sign of my age.


Daniel Lichtblau
Wolfram Research




  • References:
  • Prev by Date: Re: split again
  • Next by Date: Re: split again
  • Previous by thread: Re: split again
  • Next by thread: Re: split again