MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: position of sequence of numbers in list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107154] Re: position of sequence of numbers in list
  • From: Norbert Pozar <bertapozar at gmail.com>
  • Date: Thu, 4 Feb 2010 06:26:45 -0500 (EST)
  • References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>

Hi Leonid,

I just found another way that outperforms both of our solutions on a
wide range of integer arrays I tested. It turns out that one can
extract the position directly from SparseArray, without using
ArrayRules. It took me some time to figure it out:

extractPositionFromSparseArray[
  HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]]

positionExtr[x_List, n_] :=
 Flatten@extractPositionFromSparseArray[
   SparseArray[1 - Unitize[x - n]]]

Again, no unpacking.

Best,
Norbert

On Wed, Feb 3, 2010 at 12:21 PM, Leonid Shifrin <lshifr at gmail.com> wrote:
> Hi Norbert,
>
> Thanks again,  nice bits of information! I admit to not having done proper
> tests and therefore
> missed the slow-down of SparseArray that you mention. I think  that your
> proposed hybrid solution
> is indeed the best option from what we know so far. As for Clip vs Unitize -
> I have no doubt that Unitize must be better for what it does, being a more
> specialized command in a sense.  My old solution (which I attempted to
> reconstruct in my previous post) used additional UnitStep or something to
> make Clip work for reals, with some extra overhead of course.
>
> Regards,
> Leonid
>
>
> On Wed, Feb 3, 2010 at 11:06 PM, Norbert Pozar <bertapozar at gmail.com> wrote:
>>
>> Hi Leonid,
>>
>> that's a nice observation. I was exploring ArrayRules too, but I found
>> out that it is too slow when the array is quite dense. I was testing
>> it always on RandomInteger[1,...]. That has density 1/2. On the other
>> hand, DeleteDuplicates is quite independent of the density. When the
>> density drops below ~1/50, ArrayRules start performing better than
>> DeleteDuplicates. So I propose the following method, since Total is
>> really fast:
>>
>> positionComb[x_List, n_Integer] :=
>>  If[50 Total[#] < Length[x], Flatten@ArrayRules[#][[;; -2, 1]],
>>    Rest@DeleteDuplicates@Prepend[# Range[Length[x]], 0]] &[
>>  1 - Unitize[x - n]]
>>
>> Some timings (by the way, the timing varies a lot, so accuracy no
>> better than +-50%):
>>
>> In[1]:= tst=RandomInteger[1,40000];
>> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
>> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
>> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
>> Out[2]= 0.005
>> Out[3]= 0.04064
>> Out[4]= 0.00468
>>
>> In[5]:= tst=RandomInteger[500,40000];
>> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
>> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
>> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
>> Out[6]= 0.00218
>> Out[7]= 0.00218
>> Out[8]= 0.00188
>>
>>
>> Two points:
>> 1) you don't need ArrayRules@SparseArray@, ArrayRules@ is enough, even
>> though there is no performance benefit =)
>> 2) Unitize is better than Clip since it also works with Reals,
>> Unitize[x]==0 iff  x==0
>>
>> > Anyways, I often find it  amazing how far one can go in speeding up
>> > things
>> > in
>> > Mathematica - sometimes it can be really fast.
>> Well, this is true, but I think it'd be much better if Position was
>> implemented to work fast with packed arrays of integers, or Pick or
>> something ;)
>>
>>
>> Best,
>> Norbert
>>
>> On Wed, Feb 3, 2010 at 10:39 AM, Leonid Shifrin <lshifr at gmail.com> wrote:
>> > Hi Norbert,
>> >
>> > Thanks a lot - this is indeed pretty fast. And the way you use this in
>> > Fold
>> > is quite amazing,
>> > as well as the observation that there is no unpacking - very cool.  As
>> > far
>> > as speeding up of
>> > the myPosition function is concerned,  I toyed with precisely the same
>> > idea
>> > before
>> > in version 5. I used Clip instead of Unitze (essentially implementing
>> > Unitize), but have completely forgotten about it until I saw your
>> > solution.
>> > I now used Unitize to improve it a little (about 5-10 %). My benchmarks
>> > show
>> > that both my old and new versions are about twice faster than yours:
>> >
>> > In[1]:= Clear[myPositionOld, positionNP, myPositionNew];
>> > myPositionOld[x_List, n_Integer] :=
>> >   #[[All, 1]] &@
>> >    Most@ArrayRules@SparseArray[1 - Clip[Abs[x - n], {0, 1}]];
>> >
>> > positionNP[x_List, n_Integer] :=
>> >   Rest@DeleteDuplicates@Prepend[#, 0] &[Times[Range[Length[x]], (1 -
>> > Unitize[x - n])]];
>> >
>> > myPositionNew[x_List, n_Integer] := #[[All, 1]] &@
>> >    Most@ArrayRules@SparseArray[1 - Unitize[x - n]];
>> >
>> >
>> > In[4]:= Timing[Do[myPositionOld[tst, 10], {50}]][[1]]/50
>> >
>> > Out[4]= 0.10562
>> >
>> > In[5]:= Timing[Do[positionNP[tst, 10], {50}]][[1]]/50
>> >
>> > Out[5]= 0.1928
>> >
>> > In[6]:= Timing[Do[myPositionNew[tst, 10], {50}]][[1]]/50
>> >
>> > Out[6]= 0.09564
>> >
>> > In[7]:=
>> > Flatten@myPositionOld[tst, 10] == positionNP[tst, 10] ==
>> >  Flatten[myPositionNew[tst, 10]]
>> >
>> > Out[7]= True
>> >
>> > Anyways, I often find it  amazing how far one can go in speeding up
>> > things
>> > in
>> > Mathematica - sometimes it can be really fast. Thanks for the new info-
>> > I
>> > had no idea
>> > that DeleteDuplicates is so fast on packed arrays, and I neither was I
>> > aware
>> > of Unitize.
>> >
>> > Regards,
>> > Leonid
>> >
>> >
>> >
>> > On Wed, Feb 3, 2010 at 12:43 PM, Norbert P. <bertapozar at gmail.com>
>> > wrote:
>> >>
>> >> Hi Leonid,
>> >>
>> >> I guess JB doesn't care about speed improvement anymore, but this is
>> >> an idea that I've been using for a week (since getting Mathematica 7)
>> >> that makes finding position in a packed array much faster. This works
>> >> only in the case when one wants to find positions of all subsequences
>> >> (see my code in In[6] and notice that my old computer is much slower
>> >> than yours):
>> >>
>> >> In[1]:= list=RandomInteger[{1,15},3000000];
>> >> seq={3,4,5,6};
>> >> In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing
>> >> Out[3]= {4.485,Null}
>> >> In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];//
>> >> Timing
>> >> Out[4]= {5.453,Null}
>> >> In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing
>> >> Out[5]= {2.719,Null}
>> >>
>> >> In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0]
>> >> r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]-
>> >> Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])]
>> >> +1,Rest@seq]-Length[seq];//Timing
>> >> Out[7]= {0.422,Null}
>> >>
>> >> In[8]:= r1==r2==r3==r4
>> >> Out[8]= True
>> >>
>> >> myPosition and myPartition are the functions from your post.
>> >> I'm essentially using DeleteDuplicates together with Unitize to find
>> >> positions of all occurrences of a specific number in an array. No
>> >> unpacking occurs so it's quite fast. You can use this to possibly
>> >> improve myPosition.
>> >>
>> >> Best,
>> >> Norbert
>> >>
>> >> On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote:
>> >> > Hi again,
>> >> >
>> >> > In my first post one of the solutions (the compiled function)
>> >> > contained
>> >> > a
>> >> > bug:
>> >> >
>> >> > In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>> >> >
>> >> > Out[1]= -1
>> >> >
>> >> > which was because it ignores all but the first candidate sequences in
>> >> > the
>> >> > case when they overlap. Here is a modified one which is (hopefully)
>> >> > correct,
>> >> > if not as fast:
>> >> >
>> >> > posf2=
>> >> > Compile[{{lst,_Integer,1},{target,_Integer,1}},
>> >> >     Module[{i=1,len =Length[target],lstlen = Length[lst]},
>> >> >       While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++];
>> >> >        If[i>lstlen-len+1,-1,i]]]
>> >> >
>> >> > In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>> >> >
>> >> > Out[3]= 5
>> >> >
>> >> > This one will still be much superior to Partition-based
>> >> > implementation
>> >> > for
>> >> > cases when  you can expect the sequence of interest  to appear rather
>> >> > early
>> >> > in the list. Anyway, sorry for the confusion with the buggy version .
>> >> >
>> >> > By the way, should you wish to stick to Partition-based
>> >> > implementation,
>> >> > I
>> >> > think it is fair to mention that for small sequence sizes and
>> >> > partitioning
>> >> > with a shift 1, *and*  when you have a list already in the packed
>> >> > array
>> >> > representation (which is possible when  your numbers are say all
>> >> > integers or
>> >> > all reals, but not a mix), one can implement a more efficient version
>> >> > than
>> >> > the built-in Partition:
>> >> >
>> >> > Clear[myPartition];
>> >> > myPartition[x_List, size_Integer] :=
>> >> >   With[{len = Length[x]},
>> >> >    Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
>> >> >
>> >> > In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000];
>> >> >
>> >> > In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing
>> >> >
>> >> > Out[5]= {0.521, Null}
>> >> >
>> >> > In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing
>> >> >
>> >> > Out[6]= {0.17, Null}
>> >> >
>> >> > In[7]:= pt == mpt
>> >> >
>> >> > Out[7]= True
>> >> >
>> >> > The built-in Partition will start winning when the partitioning size
>> >> > will be
>> >> > around 30-50, so for long sequences using a built-in is better. If
>> >> > your
>> >> > list
>> >> > is not packed, built-in Partition  will be a few times faster even
>> >> > for
>> >> > small
>> >> > partitioning sizes, so this will then be a de-optimization. You can
>> >> > attempt
>> >> > to convert your list to packed array with Developer`ToPackedArray
>> >> > (note
>> >> > that
>> >> > it does not issue any messages in case it is unable to do this), and
>> >> > check
>> >> > that your list is packed with Developer`PackedArrayQ.
>> >> >
>> >> > Likewise, you can implement your own Position-like function aimed at
>> >> > exactly
>> >> > this problem, which, under similar requirements of your list being a
>> >> > packed
>> >> > array of numbers, will be better than the built-in Position in most
>> >> > cases:
>> >> >
>> >> > In[8]:=
>> >> > Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>> >> >   1] // Timing
>> >> >
>> >> > Out[8]= {0.27, {{41940}}}
>> >> >
>> >> > In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>> >> >   1] // Timing
>> >> >
>> >> > Out[9]= {0.09, {41940}}
>> >> >
>> >> > In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>> >> >   2] // Timing
>> >> >
>> >> > Out[10]= {0.301, {{41940}, {228293}}}
>> >> >
>> >> > In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>> >> >   2] // Timing
>> >> >
>> >> > Out[11]= {0.311, {41940, 228293}}
>> >> >
>> >> > In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] //
>> >> > Timing
>> >> >
>> >> > Out[12]= {0.55, {{41940}, {228293}, {269300}}}
>> >> >
>> >> > In[13]:= myPosition[
>> >> >   myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
>> >> >
>> >> > Out[13]= {0.411, {41940, 228293, 269300}}
>> >> >
>> >> > where the last argument -1 is a convention to return all results.
>> >> >
>> >> > Regards,
>> >> > Leonid
>> >> >
>> >> > On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote:
>> >> > > Hi,
>> >> >
>> >> > > What is the most efficient way to find the position of the
>> >> > > beginning
>> >> > > of a sequence of numbers from a list?
>> >> >
>> >> > > I found a couple of ways:
>> >> >
>> >> > > find 3,4 in list={1,2,3,4,5};
>> >> >
>> >> > >  1.   pos=Intersection[Position[list,3],(Position[list,4])+1]
>> >> >
>> >> > >  2.   pos=Position[Partition[list,2,1],{3,4}]
>> >> >
>> >> > > Are there other ways to do this?
>> >> > > What is the best way when dealing with large lists?
>> >> >
>> >> > > Thanks,
>> >> > > JB
>> >> >
>> >> >
>> >>
>> >
>> >
>
>


  • Prev by Date: Re: position of sequence of numbers in list
  • Next by Date: Re: position of sequence of numbers in list
  • Previous by thread: Re: position of sequence of numbers in list
  • Next by thread: Re: position of sequence of numbers in list