Re: position of sequence of numbers in list
- To: mathgroup at smc.vnet.net
- Subject: [mg107154] Re: position of sequence of numbers in list
- From: Norbert Pozar <bertapozar at gmail.com>
- Date: Thu, 4 Feb 2010 06:26:45 -0500 (EST)
- References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>
Hi Leonid, I just found another way that outperforms both of our solutions on a wide range of integer arrays I tested. It turns out that one can extract the position directly from SparseArray, without using ArrayRules. It took me some time to figure it out: extractPositionFromSparseArray[ HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]] positionExtr[x_List, n_] := Flatten@extractPositionFromSparseArray[ SparseArray[1 - Unitize[x - n]]] Again, no unpacking. Best, Norbert On Wed, Feb 3, 2010 at 12:21 PM, Leonid Shifrin <lshifr at gmail.com> wrote: > Hi Norbert, > > Thanks again, nice bits of information! I admit to not having done proper > tests and therefore > missed the slow-down of SparseArray that you mention. I think that your > proposed hybrid solution > is indeed the best option from what we know so far. As for Clip vs Unitize - > I have no doubt that Unitize must be better for what it does, being a more > specialized command in a sense. My old solution (which I attempted to > reconstruct in my previous post) used additional UnitStep or something to > make Clip work for reals, with some extra overhead of course. > > Regards, > Leonid > > > On Wed, Feb 3, 2010 at 11:06 PM, Norbert Pozar <bertapozar at gmail.com> wrote: >> >> Hi Leonid, >> >> that's a nice observation. I was exploring ArrayRules too, but I found >> out that it is too slow when the array is quite dense. I was testing >> it always on RandomInteger[1,...]. That has density 1/2. On the other >> hand, DeleteDuplicates is quite independent of the density. When the >> density drops below ~1/50, ArrayRules start performing better than >> DeleteDuplicates. So I propose the following method, since Total is >> really fast: >> >> positionComb[x_List, n_Integer] := >> If[50 Total[#] < Length[x], Flatten@ArrayRules[#][[;; -2, 1]], >> Rest@DeleteDuplicates@Prepend[# Range[Length[x]], 0]] &[ >> 1 - Unitize[x - n]] >> >> Some timings (by the way, the timing varies a lot, so accuracy no >> better than +-50%): >> >> In[1]:= tst=RandomInteger[1,40000]; >> Timing[Do[positionNP[tst,1],{50}]][[1]]/50 >> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50 >> Timing[Do[positionComb[tst,1],{50}]][[1]]/50 >> Out[2]= 0.005 >> Out[3]= 0.04064 >> Out[4]= 0.00468 >> >> In[5]:= tst=RandomInteger[500,40000]; >> Timing[Do[positionNP[tst,1],{50}]][[1]]/50 >> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50 >> Timing[Do[positionComb[tst,1],{50}]][[1]]/50 >> Out[6]= 0.00218 >> Out[7]= 0.00218 >> Out[8]= 0.00188 >> >> >> Two points: >> 1) you don't need ArrayRules@SparseArray@, ArrayRules@ is enough, even >> though there is no performance benefit =) >> 2) Unitize is better than Clip since it also works with Reals, >> Unitize[x]==0 iff x==0 >> >> > Anyways, I often find it amazing how far one can go in speeding up >> > things >> > in >> > Mathematica - sometimes it can be really fast. >> Well, this is true, but I think it'd be much better if Position was >> implemented to work fast with packed arrays of integers, or Pick or >> something ;) >> >> >> Best, >> Norbert >> >> On Wed, Feb 3, 2010 at 10:39 AM, Leonid Shifrin <lshifr at gmail.com> wrote: >> > Hi Norbert, >> > >> > Thanks a lot - this is indeed pretty fast. And the way you use this in >> > Fold >> > is quite amazing, >> > as well as the observation that there is no unpacking - very cool. As >> > far >> > as speeding up of >> > the myPosition function is concerned, I toyed with precisely the same >> > idea >> > before >> > in version 5. I used Clip instead of Unitze (essentially implementing >> > Unitize), but have completely forgotten about it until I saw your >> > solution. >> > I now used Unitize to improve it a little (about 5-10 %). My benchmarks >> > show >> > that both my old and new versions are about twice faster than yours: >> > >> > In[1]:= Clear[myPositionOld, positionNP, myPositionNew]; >> > myPositionOld[x_List, n_Integer] := >> > #[[All, 1]] &@ >> > Most@ArrayRules@SparseArray[1 - Clip[Abs[x - n], {0, 1}]]; >> > >> > positionNP[x_List, n_Integer] := >> > Rest@DeleteDuplicates@Prepend[#, 0] &[Times[Range[Length[x]], (1 - >> > Unitize[x - n])]]; >> > >> > myPositionNew[x_List, n_Integer] := #[[All, 1]] &@ >> > Most@ArrayRules@SparseArray[1 - Unitize[x - n]]; >> > >> > >> > In[4]:= Timing[Do[myPositionOld[tst, 10], {50}]][[1]]/50 >> > >> > Out[4]= 0.10562 >> > >> > In[5]:= Timing[Do[positionNP[tst, 10], {50}]][[1]]/50 >> > >> > Out[5]= 0.1928 >> > >> > In[6]:= Timing[Do[myPositionNew[tst, 10], {50}]][[1]]/50 >> > >> > Out[6]= 0.09564 >> > >> > In[7]:= >> > Flatten@myPositionOld[tst, 10] == positionNP[tst, 10] == >> > Flatten[myPositionNew[tst, 10]] >> > >> > Out[7]= True >> > >> > Anyways, I often find it amazing how far one can go in speeding up >> > things >> > in >> > Mathematica - sometimes it can be really fast. Thanks for the new info- >> > I >> > had no idea >> > that DeleteDuplicates is so fast on packed arrays, and I neither was I >> > aware >> > of Unitize. >> > >> > Regards, >> > Leonid >> > >> > >> > >> > On Wed, Feb 3, 2010 at 12:43 PM, Norbert P. <bertapozar at gmail.com> >> > wrote: >> >> >> >> Hi Leonid, >> >> >> >> I guess JB doesn't care about speed improvement anymore, but this is >> >> an idea that I've been using for a week (since getting Mathematica 7) >> >> that makes finding position in a packed array much faster. This works >> >> only in the case when one wants to find positions of all subsequences >> >> (see my code in In[6] and notice that my old computer is much slower >> >> than yours): >> >> >> >> In[1]:= list=RandomInteger[{1,15},3000000]; >> >> seq={3,4,5,6}; >> >> In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing >> >> Out[3]= {4.485,Null} >> >> In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];// >> >> Timing >> >> Out[4]= {5.453,Null} >> >> In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing >> >> Out[5]= {2.719,Null} >> >> >> >> In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0] >> >> r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]- >> >> Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])] >> >> +1,Rest@seq]-Length[seq];//Timing >> >> Out[7]= {0.422,Null} >> >> >> >> In[8]:= r1==r2==r3==r4 >> >> Out[8]= True >> >> >> >> myPosition and myPartition are the functions from your post. >> >> I'm essentially using DeleteDuplicates together with Unitize to find >> >> positions of all occurrences of a specific number in an array. No >> >> unpacking occurs so it's quite fast. You can use this to possibly >> >> improve myPosition. >> >> >> >> Best, >> >> Norbert >> >> >> >> On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote: >> >> > Hi again, >> >> > >> >> > In my first post one of the solutions (the compiled function) >> >> > contained >> >> > a >> >> > bug: >> >> > >> >> > In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}] >> >> > >> >> > Out[1]= -1 >> >> > >> >> > which was because it ignores all but the first candidate sequences in >> >> > the >> >> > case when they overlap. Here is a modified one which is (hopefully) >> >> > correct, >> >> > if not as fast: >> >> > >> >> > posf2= >> >> > Compile[{{lst,_Integer,1},{target,_Integer,1}}, >> >> > Module[{i=1,len =Length[target],lstlen = Length[lst]}, >> >> > While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++]; >> >> > If[i>lstlen-len+1,-1,i]]] >> >> > >> >> > In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}] >> >> > >> >> > Out[3]= 5 >> >> > >> >> > This one will still be much superior to Partition-based >> >> > implementation >> >> > for >> >> > cases when you can expect the sequence of interest to appear rather >> >> > early >> >> > in the list. Anyway, sorry for the confusion with the buggy version . >> >> > >> >> > By the way, should you wish to stick to Partition-based >> >> > implementation, >> >> > I >> >> > think it is fair to mention that for small sequence sizes and >> >> > partitioning >> >> > with a shift 1, *and* when you have a list already in the packed >> >> > array >> >> > representation (which is possible when your numbers are say all >> >> > integers or >> >> > all reals, but not a mix), one can implement a more efficient version >> >> > than >> >> > the built-in Partition: >> >> > >> >> > Clear[myPartition]; >> >> > myPartition[x_List, size_Integer] := >> >> > With[{len = Length[x]}, >> >> > Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]]; >> >> > >> >> > In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000]; >> >> > >> >> > In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing >> >> > >> >> > Out[5]= {0.521, Null} >> >> > >> >> > In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing >> >> > >> >> > Out[6]= {0.17, Null} >> >> > >> >> > In[7]:= pt == mpt >> >> > >> >> > Out[7]= True >> >> > >> >> > The built-in Partition will start winning when the partitioning size >> >> > will be >> >> > around 30-50, so for long sequences using a built-in is better. If >> >> > your >> >> > list >> >> > is not packed, built-in Partition will be a few times faster even >> >> > for >> >> > small >> >> > partitioning sizes, so this will then be a de-optimization. You can >> >> > attempt >> >> > to convert your list to packed array with Developer`ToPackedArray >> >> > (note >> >> > that >> >> > it does not issue any messages in case it is unable to do this), and >> >> > check >> >> > that your list is packed with Developer`PackedArrayQ. >> >> > >> >> > Likewise, you can implement your own Position-like function aimed at >> >> > exactly >> >> > this problem, which, under similar requirements of your list being a >> >> > packed >> >> > array of numbers, will be better than the built-in Position in most >> >> > cases: >> >> > >> >> > In[8]:= >> >> > Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1, >> >> > 1] // Timing >> >> > >> >> > Out[8]= {0.27, {{41940}}} >> >> > >> >> > In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6}, >> >> > 1] // Timing >> >> > >> >> > Out[9]= {0.09, {41940}} >> >> > >> >> > In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1, >> >> > 2] // Timing >> >> > >> >> > Out[10]= {0.301, {{41940}, {228293}}} >> >> > >> >> > In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6}, >> >> > 2] // Timing >> >> > >> >> > Out[11]= {0.311, {41940, 228293}} >> >> > >> >> > In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // >> >> > Timing >> >> > >> >> > Out[12]= {0.55, {{41940}, {228293}, {269300}}} >> >> > >> >> > In[13]:= myPosition[ >> >> > myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing >> >> > >> >> > Out[13]= {0.411, {41940, 228293, 269300}} >> >> > >> >> > where the last argument -1 is a convention to return all results. >> >> > >> >> > Regards, >> >> > Leonid >> >> > >> >> > On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote: >> >> > > Hi, >> >> > >> >> > > What is the most efficient way to find the position of the >> >> > > beginning >> >> > > of a sequence of numbers from a list? >> >> > >> >> > > I found a couple of ways: >> >> > >> >> > > find 3,4 in list={1,2,3,4,5}; >> >> > >> >> > > 1. pos=Intersection[Position[list,3],(Position[list,4])+1] >> >> > >> >> > > 2. pos=Position[Partition[list,2,1],{3,4}] >> >> > >> >> > > Are there other ways to do this? >> >> > > What is the best way when dealing with large lists? >> >> > >> >> > > Thanks, >> >> > > JB >> >> > >> >> > >> >> >> > >> > > >