MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: position of sequence of numbers in list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107158] Re: position of sequence of numbers in list
  • From: Leonid Shifrin <lshifr at gmail.com>
  • Date: Thu, 4 Feb 2010 06:27:29 -0500 (EST)
  • References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>

Norbert,

what can I say - this is truly impressive. This is order of magnitude faster
than the built-in position, for this problem. This  implies that the main
slow-down for the previous SparseArray-based solution was due to ArrayRules
being slow for dense arrays. In retrospect, this seems natural since
somewhere along the way something similar to unpacking must happen, since
the output is a list of rules. Probably internally SparseArray uses a packed
form for positions, and the way you extract them (directly with Part)
probably avoids the unpacking stage - this is the only plausible explanation
which comes to my mind.

I checked with Developer`PackedArrayQ on the resulting position list for
your solution, and it gives True, which supports my conjecture. Your method
seems to be applicable to many other problems where SparseArray-based tricks
are employed. This is something truly useful, thanks a lot for sharing!

Best,
Leonid

P.S.

By the way, this is irrelevant for performance, but the position-extraction
function can be rewritten without HoldPattern:

extractPositionFromSparseArrayAlt[(h : SparseArray)[u___]] := {u}[[4, 2, 2]]

Of course, in any case the key is your observation that Part can not be
directly used (since it has been internally overloaded for SparseArray
objects).




On Thu, Feb 4, 2010 at 4:59 AM, Norbert Pozar <bertapozar at gmail.com> wrote:

> I'm sorry, I just realized that one doesn't need 1-Unitize..., this will
> do:
>
> positionExtr[x_List, n_] :=
>  Flatten@extractPositionFromSparseArray[
>    SparseArray[Unitize[x - n], Automatic, 1]]
>
> Norbert
>
> On Wed, Feb 3, 2010 at 5:52 PM, Norbert Pozar <bertapozar at gmail.com>
> wrote:
> > Hi Leonid,
> >
> > I just found another way that outperforms both of our solutions on a
> > wide range of integer arrays I tested. It turns out that one can
> > extract the position directly from SparseArray, without using
> > ArrayRules. It took me some time to figure it out:
> >
> > extractPositionFromSparseArray[
> >  HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]]
> >
> > positionExtr[x_List, n_] :=
> >  Flatten@extractPositionFromSparseArray[
> >   SparseArray[1 - Unitize[x - n]]]
> >
> > Again, no unpacking.
> >
> > Best,
> > Norbert
> >
> > On Wed, Feb 3, 2010 at 12:21 PM, Leonid Shifrin <lshifr at gmail.com>
> wrote:
> >> Hi Norbert,
> >>
> >> Thanks again,  nice bits of information! I admit to not having done
> proper
> >> tests and therefore
> >> missed the slow-down of SparseArray that you mention. I think  that your
> >> proposed hybrid solution
> >> is indeed the best option from what we know so far. As for Clip vs
> Unitize -
> >> I have no doubt that Unitize must be better for what it does, being a
> more
> >> specialized command in a sense.  My old solution (which I attempted to
> >> reconstruct in my previous post) used additional UnitStep or something
> to
> >> make Clip work for reals, with some extra overhead of course.
> >>
> >> Regards,
> >> Leonid
> >>
> >>
> >> On Wed, Feb 3, 2010 at 11:06 PM, Norbert Pozar <bertapozar at gmail.com>
> wrote:
> >>>
> >>> Hi Leonid,
> >>>
> >>> that's a nice observation. I was exploring ArrayRules too, but I found
> >>> out that it is too slow when the array is quite dense. I was testing
> >>> it always on RandomInteger[1,...]. That has density 1/2. On the other
> >>> hand, DeleteDuplicates is quite independent of the density. When the
> >>> density drops below ~1/50, ArrayRules start performing better than
> >>> DeleteDuplicates. So I propose the following method, since Total is
> >>> really fast:
> >>>
> >>> positionComb[x_List, n_Integer] :=
> >>>  If[50 Total[#] < Length[x], Flatten@ArrayRules[#][[;; -2, 1]],
> >>>    Rest@DeleteDuplicates@Prepend[# Range[Length[x]], 0]] &[
> >>>  1 - Unitize[x - n]]
> >>>
> >>> Some timings (by the way, the timing varies a lot, so accuracy no
> >>> better than +-50%):
> >>>
> >>> In[1]:= tst=RandomInteger[1,40000];
> >>> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
> >>> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
> >>> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
> >>> Out[2]= 0.005
> >>> Out[3]= 0.04064
> >>> Out[4]= 0.00468
> >>>
> >>> In[5]:= tst=RandomInteger[500,40000];
> >>> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
> >>> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
> >>> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
> >>> Out[6]= 0.00218
> >>> Out[7]= 0.00218
> >>> Out[8]= 0.00188
> >>>
> >>>
> >>> Two points:
> >>> 1) you don't need ArrayRules@SparseArray@, ArrayRules@ is enough, even
> >>> though there is no performance benefit =)
> >>> 2) Unitize is better than Clip since it also works with Reals,
> >>> Unitize[x]==0 iff  x==0
> >>>
> >>> > Anyways, I often find it  amazing how far one can go in speeding up
> >>> > things
> >>> > in
> >>> > Mathematica - sometimes it can be really fast.
> >>> Well, this is true, but I think it'd be much better if Position was
> >>> implemented to work fast with packed arrays of integers, or Pick or
> >>> something ;)
> >>>
> >>>
> >>> Best,
> >>> Norbert
> >>>
> >>> On Wed, Feb 3, 2010 at 10:39 AM, Leonid Shifrin <lshifr at gmail.com>
> wrote:
> >>> > Hi Norbert,
> >>> >
> >>> > Thanks a lot - this is indeed pretty fast. And the way you use this
> in
> >>> > Fold
> >>> > is quite amazing,
> >>> > as well as the observation that there is no unpacking - very cool.
> As
> >>> > far
> >>> > as speeding up of
> >>> > the myPosition function is concerned,  I toyed with precisely the
> same
> >>> > idea
> >>> > before
> >>> > in version 5. I used Clip instead of Unitze (essentially implementing
> >>> > Unitize), but have completely forgotten about it until I saw your
> >>> > solution.
> >>> > I now used Unitize to improve it a little (about 5-10 %). My
> benchmarks
> >>> > show
> >>> > that both my old and new versions are about twice faster than yours:
> >>> >
> >>> > In[1]:= Clear[myPositionOld, positionNP, myPositionNew];
> >>> > myPositionOld[x_List, n_Integer] :=
> >>> >   #[[All, 1]] &@
> >>> >    Most@ArrayRules@SparseArray[1 - Clip[Abs[x - n], {0, 1}]];
> >>> >
> >>> > positionNP[x_List, n_Integer] :=
> >>> >   Rest@DeleteDuplicates@Prepend[#, 0] &[Times[Range[Length[x]], (1 -
> >>> > Unitize[x - n])]];
> >>> >
> >>> > myPositionNew[x_List, n_Integer] := #[[All, 1]] &@
> >>> >    Most@ArrayRules@SparseArray[1 - Unitize[x - n]];
> >>> >
> >>> >
> >>> > In[4]:= Timing[Do[myPositionOld[tst, 10], {50}]][[1]]/50
> >>> >
> >>> > Out[4]= 0.10562
> >>> >
> >>> > In[5]:= Timing[Do[positionNP[tst, 10], {50}]][[1]]/50
> >>> >
> >>> > Out[5]= 0.1928
> >>> >
> >>> > In[6]:= Timing[Do[myPositionNew[tst, 10], {50}]][[1]]/50
> >>> >
> >>> > Out[6]= 0.09564
> >>> >
> >>> > In[7]:=
> >>> > Flatten@myPositionOld[tst, 10] == positionNP[tst, 10] ==
> >>> >  Flatten[myPositionNew[tst, 10]]
> >>> >
> >>> > Out[7]= True
> >>> >
> >>> > Anyways, I often find it  amazing how far one can go in speeding up
> >>> > things
> >>> > in
> >>> > Mathematica - sometimes it can be really fast. Thanks for the new
> info -
> >>> > I
> >>> > had no idea
> >>> > that DeleteDuplicates is so fast on packed arrays, and I neither was
> I
> >>> > aware
> >>> > of Unitize.
> >>> >
> >>> > Regards,
> >>> > Leonid
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Feb 3, 2010 at 12:43 PM, Norbert P. <bertapozar at gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi Leonid,
> >>> >>
> >>> >> I guess JB doesn't care about speed improvement anymore, but this is
> >>> >> an idea that I've been using for a week (since getting Mathematica
> 7)
> >>> >> that makes finding position in a packed array much faster. This
> works
> >>> >> only in the case when one wants to find positions of all
> subsequences
> >>> >> (see my code in In[6] and notice that my old computer is much slower
> >>> >> than yours):
> >>> >>
> >>> >> In[1]:= list=RandomInteger[{1,15},3000000];
> >>> >> seq={3,4,5,6};
> >>> >> In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing
> >>> >> Out[3]= {4.485,Null}
> >>> >> In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];//
> >>> >> Timing
> >>> >> Out[4]= {5.453,Null}
> >>> >> In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing
> >>> >> Out[5]= {2.719,Null}
> >>> >>
> >>> >> In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0]
> >>> >> r4=Fold[fdz[#1
> (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]-
> >>> >> Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])]
> >>> >> +1,Rest@seq]-Length[seq];//Timing
> >>> >> Out[7]= {0.422,Null}
> >>> >>
> >>> >> In[8]:= r1==r2==r3==r4
> >>> >> Out[8]= True
> >>> >>
> >>> >> myPosition and myPartition are the functions from your post.
> >>> >> I'm essentially using DeleteDuplicates together with Unitize to find
> >>> >> positions of all occurrences of a specific number in an array. No
> >>> >> unpacking occurs so it's quite fast. You can use this to possibly
> >>> >> improve myPosition.
> >>> >>
> >>> >> Best,
> >>> >> Norbert
> >>> >>
> >>> >> On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote:
> >>> >> > Hi again,
> >>> >> >
> >>> >> > In my first post one of the solutions (the compiled function)
> >>> >> > contained
> >>> >> > a
> >>> >> > bug:
> >>> >> >
> >>> >> > In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
> >>> >> >
> >>> >> > Out[1]= -1
> >>> >> >
> >>> >> > which was because it ignores all but the first candidate sequences
> in
> >>> >> > the
> >>> >> > case when they overlap. Here is a modified one which is
> (hopefully)
> >>> >> > correct,
> >>> >> > if not as fast:
> >>> >> >
> >>> >> > posf2=
> >>> >> > Compile[{{lst,_Integer,1},{target,_Integer,1}},
> >>> >> >     Module[{i=1,len =Length[target],lstlen = Length[lst]},
> >>> >> >       While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++];
> >>> >> >        If[i>lstlen-len+1,-1,i]]]
> >>> >> >
> >>> >> > In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
> >>> >> >
> >>> >> > Out[3]= 5
> >>> >> >
> >>> >> > This one will still be much superior to Partition-based
> >>> >> > implementation
> >>> >> > for
> >>> >> > cases when  you can expect the sequence of interest  to appear
> rather
> >>> >> > early
> >>> >> > in the list. Anyway, sorry for the confusion with the buggy
> version.
> >>> >> >
> >>> >> > By the way, should you wish to stick to Partition-based
> >>> >> > implementation,
> >>> >> > I
> >>> >> > think it is fair to mention that for small sequence sizes and
> >>> >> > partitioning
> >>> >> > with a shift 1, *and*  when you have a list already in the packed
> >>> >> > array
> >>> >> > representation (which is possible when  your numbers are say all
> >>> >> > integers or
> >>> >> > all reals, but not a mix), one can implement a more efficient
> version
> >>> >> > than
> >>> >> > the built-in Partition:
> >>> >> >
> >>> >> > Clear[myPartition];
> >>> >> > myPartition[x_List, size_Integer] :=
> >>> >> >   With[{len = Length[x]},
> >>> >> >    Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
> >>> >> >
> >>> >> > In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000];
> >>> >> >
> >>> >> > In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing
> >>> >> >
> >>> >> > Out[5]= {0.521, Null}
> >>> >> >
> >>> >> > In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing
> >>> >> >
> >>> >> > Out[6]= {0.17, Null}
> >>> >> >
> >>> >> > In[7]:= pt == mpt
> >>> >> >
> >>> >> > Out[7]= True
> >>> >> >
> >>> >> > The built-in Partition will start winning when the partitioning
> size
> >>> >> > will be
> >>> >> > around 30-50, so for long sequences using a built-in is better. If
> >>> >> > your
> >>> >> > list
> >>> >> > is not packed, built-in Partition  will be a few times faster even
> >>> >> > for
> >>> >> > small
> >>> >> > partitioning sizes, so this will then be a de-optimization. You
> can
> >>> >> > attempt
> >>> >> > to convert your list to packed array with Developer`ToPackedArray
> >>> >> > (note
> >>> >> > that
> >>> >> > it does not issue any messages in case it is unable to do this),
> and
> >>> >> > check
> >>> >> > that your list is packed with Developer`PackedArrayQ.
> >>> >> >
> >>> >> > Likewise, you can implement your own Position-like function aimed
> at
> >>> >> > exactly
> >>> >> > this problem, which, under similar requirements of your list being
> a
> >>> >> > packed
> >>> >> > array of numbers, will be better than the built-in Position in
> most
> >>> >> > cases:
> >>> >> >
> >>> >> > In[8]:=
> >>> >> > Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
> >>> >> >   1] // Timing
> >>> >> >
> >>> >> > Out[8]= {0.27, {{41940}}}
> >>> >> >
> >>> >> > In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
> >>> >> >   1] // Timing
> >>> >> >
> >>> >> > Out[9]= {0.09, {41940}}
> >>> >> >
> >>> >> > In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
> >>> >> >   2] // Timing
> >>> >> >
> >>> >> > Out[10]= {0.301, {{41940}, {228293}}}
> >>> >> >
> >>> >> > In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
> >>> >> >   2] // Timing
> >>> >> >
> >>> >> > Out[11]= {0.311, {41940, 228293}}
> >>> >> >
> >>> >> > In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] //
> >>> >> > Timing
> >>> >> >
> >>> >> > Out[12]= {0.55, {{41940}, {228293}, {269300}}}
> >>> >> >
> >>> >> > In[13]:= myPosition[
> >>> >> >   myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
> >>> >> >
> >>> >> > Out[13]= {0.411, {41940, 228293, 269300}}
> >>> >> >
> >>> >> > where the last argument -1 is a convention to return all results.
> >>> >> >
> >>> >> > Regards,
> >>> >> > Leonid
> >>> >> >
> >>> >> > On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote:
> >>> >> > > Hi,
> >>> >> >
> >>> >> > > What is the most efficient way to find the position of the
> >>> >> > > beginning
> >>> >> > > of a sequence of numbers from a list?
> >>> >> >
> >>> >> > > I found a couple of ways:
> >>> >> >
> >>> >> > > find 3,4 in list={1,2,3,4,5};
> >>> >> >
> >>> >> > >  1.   pos=Intersection[Position[list,3],(Position[list,4])+1]
> >>> >> >
> >>> >> > >  2.   pos=Position[Partition[list,2,1],{3,4}]
> >>> >> >
> >>> >> > > Are there other ways to do this?
> >>> >> > > What is the best way when dealing with large lists?
> >>> >> >
> >>> >> > > Thanks,
> >>> >> > > JB
> >>> >> >
> >>> >> >
> >>> >>
> >>> >
> >>> >
> >>
> >>
> >
>



  • Prev by Date: Re: position of sequence of numbers in list
  • Next by Date: Re: Mathematica 6.01 does not know one can not divide by 0??
  • Previous by thread: Re: position of sequence of numbers in list
  • Next by thread: Re: position of sequence of numbers in list