Re: split again

*To*: mathgroup at smc.vnet.net*Subject*: [mg73678] Re: [mg73622] split again*From*: Daniel Lichtblau <danl at wolfram.com>*Date*: Sat, 24 Feb 2007 02:22:00 -0500 (EST)*References*: <200702230936.EAA17604@smc.vnet.net>

Arkadiusz.Majka at gmail.com wrote: > Hi, > > Thank you very much for your help in my provious post. Now, consider > please a list > > z=Table[Random[],{1000}]; > > zSplit=Split[z,#1<=0.7 === #2>=0.7]; (bulit thanks to your > help) This is not quite right. See code below. > I want to pick the first sublist of zSplit that consists of elements > <= 0.7 and whose length is greater than a certain number (say 5). I > think that a good candidate would be using Cases and variations of _ , > but I don't know how. > > What I want to do (and what my both posts are about) is to find the > first sublist of z with all elements less than A and the length of > this sublist must be greater than B. Maybe there exists better > solution than to Split z in advance since what I need to do in my next > step is to find ONLY the FIRST sublist of splitted z. > > Thanks again, > > Arek > A straightforward walk of the list, stopping at the first run of however many elements, is probably best. At least when I use Compile to do this I tend to get good results. Here is code I wrote based on your description and using Split. mySplit1[ll_,thresh_,size_] := Module[ {sll, pos}, sll = Split[ll, (#1<=thresh=!=#2>=thresh)&]; pos = Position[sll, a_ /; Length[a]>=size && a[[1]]<=thresh, 1, 1]; If [Length[pos]==0, {}, Extract[sll,pos]] ] Here is code that uses a walk. I split up the code into two parts. The main one uses Compile for speed and returns the start position of the first run, or 0 if there is no such. mySplit2[ll_,thresh_,size_] := With[ {index=mySplit2First[ll,thresh,size]}, If [index[[1]]==0, {}, {index[[1]],ll[[Range[index[[1]],index[[2]]]]]}] ] mySplit2First = Compile[{{ll,_Real,1},{thresh,_Real,0},{size,_Integer,0}}, Module[{n=Length[ll], i=0, k=0, l=0}, Do [ If [ll[[j]]<=thresh, i++; If [i==size, k=j-size+1]; If [i>=size,l=j]; , (* else *) If [i>=size, Break[], i = 0]]; , {j,1,n}]; {k,l} ]] On longish lists there is a very clear speed difference. In[115]:= n = 10^6; In[116]:= z = Table[Random[],{n}]; In[117]:= Timing[mySplit1[z, .7, 30]] Out[117]= {2.88818, {{0.24682, 0.418994, 0.276445, 0.297828, 0.309867, 0.0504006, 0.565136, 0.402471, 0.214515, 0.695627, 0.452592, 0.619066, 0.615646, 0.669529, 0.0768277, 0.63955, 0.254386, 0.294692, 0.298738, 0.407727, 0.618329, 0.643527, 0.35957, 0.540794, 0.37151, 0.224532, 0.0831248, 0.242966, 0.0616422, 0.174132, 0.517989}}} In[131]:= Timing[mySplit2[z, .7, 30]] Out[131]= {0.008, {22584, {0.24682, 0.418994, 0.276445, 0.297828, 0.309867, 0.0504006, 0.565136, 0.402471, 0.214515, 0.695627, 0.452592, 0.619066, 0.615646, 0.669529, 0.0768277, 0.63955, 0.254386, 0.294692, 0.298738, 0.407727, 0.618329, 0.643527, 0.35957, 0.540794, 0.37151, 0.224532, 0.0831248, 0.242966, 0.0616422, 0.174132, 0.517989}}} So we found a run beginning around 2.25 % into the list. Suppose we actually had to walk the entire list? Would we see much difference then? We see below that there is no run of length 40, and moreover the second method is still substantially faster. In[119]:= Timing[mySplit1[z, .7, 40]] Out[119]= {3.47222, {}} In[120]:= Timing[mySplit2[z, .7, 40]] Out[120]= {0.292018, {}} I take the moral to be that one should not abandon all those procedural programming methods that were in vogue 25 years ago. Probably just a sign of my age. Daniel Lichtblau Wolfram Research

**References**:**split again***From:*Arkadiusz.Majka@gmail.com