MathGroup Archive: February 2002 [00360]

[Date Index] [Thread Index] [Author Index]

RE: Re: irritating little problem

To: mathgroup at smc.vnet.net
Subject: [mg32968] RE: [mg32926] Re: irritating little problem
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Fri, 22 Feb 2002 01:48:51 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

please see below:

> -----Original Message-----
> From:	Allan Hayes [SMTP:hay at haystack.demon.co.uk]
To: mathgroup at smc.vnet.net
> Sent:	Wednesday, February 20, 2002 7:26 AM
> To:	mathgroup at smc.vnet.net
> Subject:	[mg32926] Re: irritating little problem
> 
> Peter,
> One immediately thinks of using Position as in the attempt 1 below, but
> this
> involves a search of the list for every one of its values and is clearly
> inefficent.
> Much better is to attach the positions to the elements of the list and
> then
> manipulate the result - attempt 2.
> 
> To get timings I use
> 
>     n=500
>     vec = Table[Random[Integer, {0, n}], {10n}];
> 
> 
> attempt 1:
> 
>     pn1=(#->Position[vec,#])&/@Union[vec];//Timing
> 
>         {17.63 Second,Null}
> 
>     0/.pn1
> 
> {{342},{581},{2162},{2281},{2315},{3051},{3071},{3239},{3351},{4425}}
> 
> attempt2:
> 
>     pn4=(#[[1,1]]->#[[All,2]]&/@
>           Split[Sort[
>               Transpose[{vec,
>                   Range[Length[vec]]}]],#1[[1]] == #2[[1]]&]);//Timing
> 
>         {0.94 Second,Null}
> 
>     0/.pn4
> 
>         {342,581,2162,2281,2315,3051,3071,3239,3351,4425}
> 
> - The inner lists in attempt 1 can be removed by flattening the answer.
> - We could have attached the positions to the elements in attempt 1 by
> using
> MapIndexed, but I think that this is slower.
> - The advantage of attempt 1 increases with the size of n.
> 
> --
> Allan
> 
> ---------------------
> Allan Hayes
> Mathematica Training and Consulting
> Leicester UK
> www.haystack.demon.co.uk
> hay at haystack.demon.co.uk
> Voice: +44 (0)116 271 4198
> Fax: +44 (0)870 164 0565
> 
> 
> "KIMIC Weijnitz Peter" <micweij at eka.ericsson.se> wrote in message
> news:a4sv9c$ido$1 at smc.vnet.net...
> > I have a simple vector
> > and I want to find the position of elements that are equal.
> >
> > I.e I want to test the vector and find all cases of similar elements.
> >
> > Brute force is not what I want, it can be a long vector.
> > Best regards
> > Petr W
> >
> 
[Hartmut Wolf]  

Dear Allan,

your solution "attempt 2" is of universal importance. It reveals two
important observations: 

(1) if there are repetions in a list, then mapping over the sorted and split
list can be much faster than mapping over the original list (since it is
shorter by a factor). 
The extra operations Range, Transpose, Sort and Split are extremly fast
compared to Map, since the are deeply buried into the kernel and do not
reenter expression evaluation, as Map has to do when applying the function
to the argument.
 
(2)  although it introduces an (n log n) component (Sort), this is of no
importance even for large lists within the confines of real memory of an
average workstation.

I compared your solution with a "MapIndexed" variant you mentioned (but did
not show). Furthermore , I tried and succeeded in avoiding the performance
penalty on Split when the test function is applied. This gives another
performance boost, which adds to the value of your "design pattern".

 
In[1]:= SeedRandom[1]
In[2]:= n = 500;m = 10;
        vec = Table[Random[Integer, {0, n}], {m n}];


Your attempt 2 (repeated here for convenience):

In[4]:= pn4 = (#[[1, 1]] -> #[[All, 2]] &) /@ 
        Split[Sort[
            Transpose[{vec, 
                Range[Length[vec]]}]], #1[[1]] == #2[[1]] &]; // Timing
Out[4]=
{0.621 Second, Null}

In[5]:= 0 /. pn4
Out[5]= {1015, 1449, 1520, 2487, 4072}


a "MapIndexed" method

In[6]:= Remove[p]
In[7]:= p[_] := {}

In[8]:= MapIndexed[((p[#1] = {p[#1], #2}) &), vec]; // Timing
Out[8]=
{1.092 Second, Null}

In[9]:= pn5 = n_ :> Flatten[p[n]];

In[10]:= 0 /. pn5
Out[10]= {1015, 1449, 1520, 2487, 4072}


"attempt 6", improves pn4:

In[11]:=
pn6 = Module[{sortvec, posns, splitvec, starts},
        {sortvec, posns} = Transpose[
            Sort[Transpose[{vec, Range[Length[vec]]}]]];
        starts = FoldList[#1 + Length[#2] &, 1, splitvec = Split[sortvec]]; 
        MapThread[(First[#1] -> Take[posns, #2] &), {splitvec, 
            Drop[Transpose[{starts, RotateLeft[starts - 1]}], -1]}
          ]]; // Timing
Out[11]=
{0.18 Second, Null}

In[12]:= 0 /. pn6
Out[12]= {1015, 1449, 1520, 2487, 4072}


Of course the Sort/Split methods will loose there advantage when there are
only a few repetions. To test that, I compared timings on different probes
(for n and m respectively):

 
In[1]:= s = 5^Range[0, 6];

In[2]:= probes = Transpose[{s, Reverse[s]}]
Out[2]=
{{1, 15625}, {5, 3125}, {25, 625}, {125, 125}, 
 {625, 25}, {3125, 5}, {15625, 1}}


I'll just show my timing results:


t4 = {1.492, 1.442, 1.432, 1.452, 1.593, 1.903, 2.784}
t5 = {2.904, 3.786, 3.115, 4.406, 3.495, 5.368, 4.968}
t6 = {0.191, 0.23, 0.23, 0.251, 0.391, 0.902, 2.323}


It's quite illuminating to compare them in a graph:

<< Graphics`MultipleListPlot`

MultipleListPlot[t4, t5, t6, PlotJoined -> True]

--
Hartmut

Prev by Date: Re: complexity of AppendTo

Next by Date: How build a ?TestFuntion that applied to a list gives only the elements that are function of x

Previous by thread: Re: Irritating Little Problem

Next by thread: Manipulating Pattern Matching ?