MathGroup Archive: February 2010 [00149]

[Date Index] [Thread Index] [Author Index]
Re: position of sequence of numbers in list
To: mathgroup at smc.vnet.net
Subject: [mg107149] Re: position of sequence of numbers in list
From: Leonid Shifrin <lshifr at gmail.com>
Date: Thu, 4 Feb 2010 06:25:51 -0500 (EST)
References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>
Hi Norbert,

Thanks again,  nice bits of information! I admit to not having done proper
tests and therefore
missed the slow-down of SparseArray that you mention. I think  that your
proposed hybrid solution
is indeed the best option from what we know so far. As for Clip vs Unitize -
I have no doubt that Unitize must be better for what it does, being a more
specialized command in a sense.  My old solution (which I attempted to
reconstruct in my previous post) used additional UnitStep or something to
make Clip work for reals, with some extra overhead of course.

Regards,
Leonid


On Wed, Feb 3, 2010 at 11:06 PM, Norbert Pozar <bertapozar at gmail.com> wrote:

> Hi Leonid,
>
> that's a nice observation. I was exploring ArrayRules too, but I found
> out that it is too slow when the array is quite dense. I was testing
> it always on RandomInteger[1,...]. That has density 1/2. On the other
> hand, DeleteDuplicates is quite independent of the density. When the
> density drops below ~1/50, ArrayRules start performing better than
> DeleteDuplicates. So I propose the following method, since Total is
> really fast:
>
> positionComb[x_List, n_Integer] :=
>  If[50 Total[#] < Length[x], Flatten@ArrayRules[#][[;; -2, 1]],
>    Rest@DeleteDuplicates@Prepend[# Range[Length[x]], 0]] &[
>  1 - Unitize[x - n]]
>
> Some timings (by the way, the timing varies a lot, so accuracy no
> better than +-50%):
>
> In[1]:= tst=RandomInteger[1,40000];
> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
> Out[2]= 0.005
> Out[3]= 0.04064
> Out[4]= 0.00468
>
> In[5]:= tst=RandomInteger[500,40000];
> Timing[Do[positionNP[tst,1],{50}]][[1]]/50
> Timing[Do[myPositionNew[tst,1],{50}]][[1]]/50
> Timing[Do[positionComb[tst,1],{50}]][[1]]/50
> Out[6]= 0.00218
> Out[7]= 0.00218
> Out[8]= 0.00188
>
>
> Two points:
> 1) you don't need ArrayRules@SparseArray@, ArrayRules@ is enough, even
> though there is no performance benefit =)
> 2) Unitize is better than Clip since it also works with Reals,
> Unitize[x]==0 iff  x==0
>
> > Anyways, I often find it  amazing how far one can go in speeding up
> things
> > in
> > Mathematica - sometimes it can be really fast.
> Well, this is true, but I think it'd be much better if Position was
> implemented to work fast with packed arrays of integers, or Pick or
> something ;)
>
>
> Best,
> Norbert
>
> On Wed, Feb 3, 2010 at 10:39 AM, Leonid Shifrin <lshifr at gmail.com> wrote:
> > Hi Norbert,
> >
> > Thanks a lot - this is indeed pretty fast. And the way you use this in
> Fold
> > is quite amazing,
> > as well as the observation that there is no unpacking - very cool.  As
> far
> > as speeding up of
> > the myPosition function is concerned,  I toyed with precisely the same
> idea
> > before
> > in version 5. I used Clip instead of Unitze (essentially implementing
> > Unitize), but have completely forgotten about it until I saw your
> solution.
> > I now used Unitize to improve it a little (about 5-10 %). My benchmarks
> show
> > that both my old and new versions are about twice faster than yours:
> >
> > In[1]:= Clear[myPositionOld, positionNP, myPositionNew];
> > myPositionOld[x_List, n_Integer] :=
> >   #[[All, 1]] &@
> >    Most@ArrayRules@SparseArray[1 - Clip[Abs[x - n], {0, 1}]];
> >
> > positionNP[x_List, n_Integer] :=
> >   Rest@DeleteDuplicates@Prepend[#, 0] &[Times[Range[Length[x]], (1 -
> > Unitize[x - n])]];
> >
> > myPositionNew[x_List, n_Integer] := #[[All, 1]] &@
> >    Most@ArrayRules@SparseArray[1 - Unitize[x - n]];
> >
> >
> > In[4]:= Timing[Do[myPositionOld[tst, 10], {50}]][[1]]/50
> >
> > Out[4]= 0.10562
> >
> > In[5]:= Timing[Do[positionNP[tst, 10], {50}]][[1]]/50
> >
> > Out[5]= 0.1928
> >
> > In[6]:= Timing[Do[myPositionNew[tst, 10], {50}]][[1]]/50
> >
> > Out[6]= 0.09564
> >
> > In[7]:=
> > Flatten@myPositionOld[tst, 10] == positionNP[tst, 10] ==
> >  Flatten[myPositionNew[tst, 10]]
> >
> > Out[7]= True
> >
> > Anyways, I often find it  amazing how far one can go in speeding up
> things
> > in
> > Mathematica - sometimes it can be really fast. Thanks for the new info -
> I
> > had no idea
> > that DeleteDuplicates is so fast on packed arrays, and I neither was I
> aware
> > of Unitize.
> >
> > Regards,
> > Leonid
> >
> >
> >
> > On Wed, Feb 3, 2010 at 12:43 PM, Norbert P. <bertapozar at gmail.com>
> wrote:
> >>
> >> Hi Leonid,
> >>
> >> I guess JB doesn't care about speed improvement anymore, but this is
> >> an idea that I've been using for a week (since getting Mathematica 7)
> >> that makes finding position in a packed array much faster. This works
> >> only in the case when one wants to find positions of all subsequences
> >> (see my code in In[6] and notice that my old computer is much slower
> >> than yours):
> >>
> >> In[1]:= list=RandomInteger[{1,15},3000000];
> >> seq={3,4,5,6};
> >> In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing
> >> Out[3]= {4.485,Null}
> >> In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];//
> >> Timing
> >> Out[4]= {5.453,Null}
> >> In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing
> >> Out[5]= {2.719,Null}
> >>
> >> In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0]
> >> r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]-
> >> Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])]
> >> +1,Rest@seq]-Length[seq];//Timing
> >> Out[7]= {0.422,Null}
> >>
> >> In[8]:= r1==r2==r3==r4
> >> Out[8]= True
> >>
> >> myPosition and myPartition are the functions from your post.
> >> I'm essentially using DeleteDuplicates together with Unitize to find
> >> positions of all occurrences of a specific number in an array. No
> >> unpacking occurs so it's quite fast. You can use this to possibly
> >> improve myPosition.
> >>
> >> Best,
> >> Norbert
> >>
> >> On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote:
> >> > Hi again,
> >> >
> >> > In my first post one of the solutions (the compiled function)
> contained
> >> > a
> >> > bug:
> >> >
> >> > In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
> >> >
> >> > Out[1]= -1
> >> >
> >> > which was because it ignores all but the first candidate sequences in
> >> > the
> >> > case when they overlap. Here is a modified one which is (hopefully)
> >> > correct,
> >> > if not as fast:
> >> >
> >> > posf2=
> >> > Compile[{{lst,_Integer,1},{target,_Integer,1}},
> >> >     Module[{i=1,len =Length[target],lstlen = Length[lst]},
> >> >       While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++];
> >> >        If[i>lstlen-len+1,-1,i]]]
> >> >
> >> > In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
> >> >
> >> > Out[3]= 5
> >> >
> >> > This one will still be much superior to Partition-based implementation
> >> > for
> >> > cases when  you can expect the sequence of interest  to appear rather
> >> > early
> >> > in the list. Anyway, sorry for the confusion with the buggy version.
> >> >
> >> > By the way, should you wish to stick to Partition-based
> implementation,
> >> > I
> >> > think it is fair to mention that for small sequence sizes and
> >> > partitioning
> >> > with a shift 1, *and*  when you have a list already in the packed
> array
> >> > representation (which is possible when  your numbers are say all
> >> > integers or
> >> > all reals, but not a mix), one can implement a more efficient version
> >> > than
> >> > the built-in Partition:
> >> >
> >> > Clear[myPartition];
> >> > myPartition[x_List, size_Integer] :=
> >> >   With[{len = Length[x]},
> >> >    Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
> >> >
> >> > In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000];
> >> >
> >> > In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing
> >> >
> >> > Out[5]= {0.521, Null}
> >> >
> >> > In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing
> >> >
> >> > Out[6]= {0.17, Null}
> >> >
> >> > In[7]:= pt == mpt
> >> >
> >> > Out[7]= True
> >> >
> >> > The built-in Partition will start winning when the partitioning size
> >> > will be
> >> > around 30-50, so for long sequences using a built-in is better. If
> your
> >> > list
> >> > is not packed, built-in Partition  will be a few times faster even for
> >> > small
> >> > partitioning sizes, so this will then be a de-optimization. You can
> >> > attempt
> >> > to convert your list to packed array with Developer`ToPackedArray
> (note
> >> > that
> >> > it does not issue any messages in case it is unable to do this), and
> >> > check
> >> > that your list is packed with Developer`PackedArrayQ.
> >> >
> >> > Likewise, you can implement your own Position-like function aimed at
> >> > exactly
> >> > this problem, which, under similar requirements of your list being a
> >> > packed
> >> > array of numbers, will be better than the built-in Position in most
> >> > cases:
> >> >
> >> > In[8]:=
> >> > Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
> >> >   1] // Timing
> >> >
> >> > Out[8]= {0.27, {{41940}}}
> >> >
> >> > In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
> >> >   1] // Timing
> >> >
> >> > Out[9]= {0.09, {41940}}
> >> >
> >> > In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
> >> >   2] // Timing
> >> >
> >> > Out[10]= {0.301, {{41940}, {228293}}}
> >> >
> >> > In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
> >> >   2] // Timing
> >> >
> >> > Out[11]= {0.311, {41940, 228293}}
> >> >
> >> > In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // Timing
> >> >
> >> > Out[12]= {0.55, {{41940}, {228293}, {269300}}}
> >> >
> >> > In[13]:= myPosition[
> >> >   myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
> >> >
> >> > Out[13]= {0.411, {41940, 228293, 269300}}
> >> >
> >> > where the last argument -1 is a convention to return all results.
> >> >
> >> > Regards,
> >> > Leonid
> >> >
> >> > On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote:
> >> > > Hi,
> >> >
> >> > > What is the most efficient way to find the position of the beginning
> >> > > of a sequence of numbers from a list?
> >> >
> >> > > I found a couple of ways:
> >> >
> >> > > find 3,4 in list={1,2,3,4,5};
> >> >
> >> > >  1.   pos=Intersection[Position[list,3],(Position[list,4])+1]
> >> >
> >> > >  2.   pos=Position[Partition[list,2,1],{3,4}]
> >> >
> >> > > Are there other ways to do this?
> >> > > What is the best way when dealing with large lists?
> >> >
> >> > > Thanks,
> >> > > JB
> >> >
> >> >
> >>
> >
> >
>
Prev by Date: Re: position of sequence of numbers in list
Next by Date: Re: position of sequence of numbers in list
Previous by thread: Re: position of sequence of numbers in list
Next by thread: Re: position of sequence of numbers in list