MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE: RE: speed-up of a function

  • To: mathgroup at smc.vnet.net
  • Subject: [mg45165] RE: [mg45117] RE: [mg45072] speed-up of a function
  • From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
  • Date: Fri, 19 Dec 2003 06:57:31 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

>-----Original Message-----
>From: Wolf, Hartmut 
To: mathgroup at smc.vnet.net
>Sent: Wednesday, December 17, 2003 1:55 PM
>To: mathgroup at smc.vnet.net
>Subject: [mg45165] [mg45117] RE: [mg45072] speed-up of a function
>
>
>
>>-----Original Message-----
>>From: Kastens at Hamburg.BAW.DE [mailto:Kastens at Hamburg.BAW.DE]
To: mathgroup at smc.vnet.net
>To: mathgroup at smc.vnet.net
>>Sent: Tuesday, December 16, 2003 12:21 PM
>>To: mathgroup at smc.vnet.net
>>Subject: [mg45165] [mg45117] [mg45072] speed-up of a function
>>
>>
>>Hi!
>>
>>I have something like this: (two lists(sublistTNW1 and 
>>sublistTNW2)), containing both time (Integer) and a value (Real)).
>>The following function pick-up the next dataset(time,value) 
>>out of sublistTNW2 after the given time (sublistTNW1[[i,1]]).
>>
>>NextEvent[t_, list_] := Select[list, (#[[1]] >= t) &, 1]
>>For[i = 1, i < 500, 
>>{
>>      NextEvent[sublistTNW1[[i, 1]], sublistTNW2]
>>}; i++]
>>
>>Calling the function in the for-loop very often (f.ex 500 
>>times or more) is indeed very slowly. 
>>
>>Trying to compile the function
>>testfunc = 
>>  Compile[{{t, _Integer}, {list, _Real, 2}}, Select[list, 
>>(#[[1]] >= t) &, 1]]
>>
>>takes no effect in speed-up. Perhabs I've used the 
>>compile-Function not correctly?
>>
>>How can I use lists with different data types in compile? 
>>{list, _Real, 2} (s.a.) is not exactly right, because the list 
>>is in the format {_Integer,_Real}.
>>
>>Thanks for any suggestions,
>>marko
>>
>
>
>Marko,
>
>I make a model of your computation, hope it applies. 
>
>I don't thing your computation make stoo much sense unless 
>sublistTNW2 is sorted. I also suppose sublistTNW1 is sorted.
>If not, try to reformulate your problem.
>
>Now the model:
> 
>In[1]:= n0 = 100000;
>In[2]:=
>sublistTNW2 = Sort[Table[{Random[Integer, {1, n0}], Random[]}, {n0}]];
>In[3]:=
>sublistTNW1 = Sort[Table[Random[Integer, {1, n0}], {500}]];
>
>In[4]:= NextEvent[t_, list_] := Select[list, (#[[1]] >= t) &, 1]
>
>
>In[6]:= r = {}; For[i = 1, i <= 15, 
>  AppendTo[r, NextEvent[sublistTNW1[[i]], sublistTNW2]]; i++]; r = 
>  Flatten[r, 1]
>Out[6]=
>{{210, 0.0739654}, {290, 0.571214}, {294, 0.192693},
> {304, 0.718815}, {701, 0.202688}, {1216, 0.140055},
> {1276, 0.335399}, {1454, 0.593851}, {1472, 0.6723},
> {1604, 0.271997}, {1905, 0.962603}, {1922, 0.74373},
> {2009, 0.217664}, {2125, 0.584308}, {2584, 0.476056}}
>
>In[7]:= Length[r]
>Out[7]= 15
>
>This is just to check for correctness for an alternative algorithm.
>
>
>Now we test your's:
>
>In[8]:= For[i = 1, i <= 50, NextEvent[sublistTNW1[[i]], 
>sublistTNW2]; i++]
>// Timing
>Out[8]= {4.587 Second, Null}
>
>In[9]:= For[i = 1, i <= 100, NextEvent[sublistTNW1[[i]], 
>sublistTNW2]; i++]
>// Timing
>Out[9]= {20.5 Second, Null}
>
>In[10]:= For[i = 1, i <= 150, NextEvent[sublistTNW1[[i]], 
>sublistTNW2]; i++]
>// Timing
>Out[10]= {47.608 Second, Null}
>
>You see: you algorithm is O[n^2], which is prohibitive in this case.
>
>
>The reason is clear: for (sorted sublists) we start always at 
>the beginning,
>such we get longer and longer scans.
>
>
>
>This simple trick continues scanning at the last hit:
> 
>In[11]:= i = 1; Select[sublistTNW2, 
>  If[#[[1]] >= sublistTNW1[[i]], ++i; True, False] &, 15]
>Out[11]=
>{{210, 0.0739654}, {290, 0.571214}, {294, 0.192693},
> {304, 0.718815}, {701, 0.202688}, {1216, 0.140055},
> {1276, 0.335399}, {1454, 0.593851}, {1472, 0.6723},
> {1604, 0.271997}, {1905, 0.962603}, {1922, 0.74373},
> {2009, 0.217664}, {2125, 0.584308}, {2584, 0.476056}}
>
>In[12]:= % == r
>Out[12]= True
>
>Same result as for the reference algorithm.
>
>Now test:
> 
>In[17]:=
>i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; 
>          True, False] &, 50]; // Timing
>Out[17]= {0.38 Second, Null}
>
>In[18]:=
>i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; 
>          True, False] &, 100]; // Timing
>Out[18]= {0.811 Second, Null}
>
>In[19]:= 
>i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i;
>          True, False] &, 150]; // Timing
>Out[19]= {1.142 Second, Null}
>
>In[20]:= 
>i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i;
>          True, False] &, 200]; // Timing
>Out[20]= {1.512 Second, Null}
>
>
>So the algorithm is linear (and scans sublistTNW2 only once, instead of
>repeating at beginning of sublistTNW2 over and over).
>
>
>Whether this algorithms is really appropriate depends on more 
>properties of your problem, you didn't report. Just try it!
>
>
>Other strategies were to to binay search (sorted) sublistTNW2, 
>or to splice in sublistTNW1 as markers, Sort and Split. Perhaps others...
>
>
>--
>Hartmut Wolf
>
>


Marko, 

here now my promised explanations and comparisons:


data:

In[1]:= n1 = 7;

In[2]:= sublistTNW1 = 
  Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n1}]]
Out[2]= {{2, 0.285281}, {4, 0.228988}, {5, 0.0569614}, {8, 0.387688},
  {16, 0.982228}, {18, 0.253338}, {18, 0.40502}}

In[3]:= n2 = 20;

In[4]:= sublistTNW2 = 
  Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n2}]]
Out[4]= {{1, 0.539148}, {2, 0.726261}, {3, 0.827974}, {5, 0.343456},
  {5, 0.422791}, {6, 0.0319436}, {6, 0.453189}, {8, 0.681286},
  {10, 0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 0.92887},
  {14, 0.21612}, {16, 0.0822718}, {16, 0.719935}, {17, 0.231015},
  {17, 0.322158}, {18, 0.472525}, {19, 0.754313}, {20, 0.831389}}
 


In[11]:= i = 1; s = 
  Select[sublistTNW2, 
    If[First[#] >= First[sublistTNW1[[i]]], ++i; True, False] &, 
    Length[sublistTNW1]]
Out[11]=
{{2, 0.726261}, {5, 0.343456}, {5, 0.422791}, {8, 0.681286}, {16, 
    0.0822718}, {18, 0.472525}, {19, 0.754313}}


Compare that result with 

In[8]:= sublistTNW1
Out[8]= {{2, 0.285281}, {4, 0.228988}, {5, 0.0569614}, {8, 0.387688},
  {16, 0.982228}, {18, 0.253338}, {18, 0.40502}}

you see the weekness of the idea, perhaps the third hit should be identical
to the second (and sixth to seventh). The answer, whther ok or not, however,
must come from your application. 


Your algorithm gives a different result here:
In[9]:= s = Function[comp, 
    Select[sublistTNW2, First[#] >= First[comp] &, 1]] /@ sublistTNW1
Out[9]=
{{{2, 0.726261}}, {{5, 0.343456}}, {{5, 0.343456}}, {{8, 0.681286}},
 {{16, 0.0822718}}, {{18, 0.472525}}, {{18, 0.472525}}}



Now it would not be too easy to fix my (not quite so brilliant) idea.
On the other side, perhaps, dubletts should not show up in the result. Such
if you wanted

In[10]:= s0 = Union[Flatten[s, 1]]
Out[10]= 
{{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18,
0.472525}}

a fix would be much easier.
 


In[12]:= i = 1; len = Length[sublistTNW1]; s2 = 
  Select[sublistTNW2, 
    If[i <= len && (this = First[#]) >= First[sublistTNW1[[i]]], 
        While[(++i <= len) && this >= First[sublistTNW1[[i]]]]; True, 
        False] &]
Out[12]=
{{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18,
0.472525}}

When we get a hit, we just skip all other search keys, which would give the
same hit (in your algorithm)



 Merge/Sort/Split approach:


This is another idea I had indicated in my first post. We merge both lists
sorted, say sublistTNW1 would be all red, then in the result after each red
element, the next black one would be the corresponding hit.

Split just is a fast way to locate the markers.
As we do not have colors, we tag sublistTNW1

In[21]:= markers = Transpose[
   {sublistTNW1[[All, 1]], Table[-1, {Length[sublistTNW1]}]}]
Out[21]=
{{2, -1}, {4, -1}, {5, -1}, {8, -1}, {16, -1}, {18, -1}, {18, -1}}

In[22]:= Sort[Join[markers, sublistTNW2]]
Out[22]=
{{1, 0.539148}, {2, -1}, {2, 0.726261}, {3, 0.827974}, {4, -1}, {5, -1}, {5,

    0.343456}, {5, 0.422791}, {6, 0.0319436}, {6, 0.453189}, {8, -1}, {8, 
    0.681286}, {10, 0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 
    0.92887}, {14, 0.21612}, {16, -1}, {16, 0.0822718}, {16, 0.719935}, {17,

    0.231015}, {17, 0.322158}, {18, -1}, {18, -1}, {18, 0.472525}, {19, 
    0.754313}, {20, 0.831389}}

In[23]:= Split[%, Last[#1] =!= -1 &]
Out[23]=
{{{1, 0.539148}, {2, -1}}, {{2, 0.726261}, {3, 
      0.827974}, {4, -1}}, {{5, -1}}, {{5, 0.343456}, {5, 0.422791}, {6, 
      0.0319436}, {6, 0.453189}, {8, -1}}, {{8, 0.681286}, {10, 
      0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 0.92887}, {14, 
      0.21612}, {16, -1}}, {{16, 0.0822718}, {16, 0.719935}, {17, 
      0.231015}, {17, 0.322158}, {18, -1}}, {{18, -1}}, {{18, 0.472525},
{19, 
      0.754313}, {20, 0.831389}}}

In[24]:= Rest[First /@ %]
Out[24]=
{{2, 0.726261}, {5, -1}, {5, 0.343456}, {8, 0.681286}, {16, 
    0.0822718}, {18, -1}, {18, 0.472525}}


Markers that still show up here indicate dubletts.
We might drop them if not desired.

In[25]:= s3 = DeleteCases[%, {_, -1}]
Out[25]=
{{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18,
0.472525}}

In[26]:= s0
Out[26]= 
{{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18,
0.472525}}




 Binary Search:

In[28]:= << DiscreteMath`Combinatorica`
In[29]:= ?BinarySearch

In[31]:= klist = sublistTNW2[[All, 1]]
Out[31]=
{1, 2, 3, 5, 5, 6, 6, 8, 10, 11, 13, 13, 14, 16, 16, 17, 17, 18, 19, 20}

The keys in the list to be searched

In[32]:= slist = sublistTNW1[[All, 1]]
Out[32]=
{2, 4, 5, 8, 16, 18, 18}

The list of the search keys.

In[33]:= BinarySearch[klist, #] & /@ slist
Out[33]=
{2, 7/2, 5, 8, 15, 18, 18}

Direct hits are given by integer indices, otherwise (for us) the interesting
element would be given by the next integer

In[34]:= Ceiling[%]
Out[34]=
{2, 4, 5, 8, 15, 18, 18}

We see one of the two dublett pairs (the index 18 repeats). But where has
the other gone?

In[35]:= sublistTNW2[[Union[%]]]
Out[35]=
{{2, 0.726261}, {5, 0.343456}, {5, 0.422791}, {8, 0.681286},
 {16, 0.719935}, {18, 0.472525}}

Here it comes up again!

The reason is, binary search stops when it gets a hit. If we want the lowest
of all possible hits, we have to do some corrective work:

There also is another point, if we search for a key that is too large, we
get an index out of bounds. 
In[36]:= 
s4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], 
      flist, k0, pos}, 
    flist = Union[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= 
                    Length[klist], While[klist[[k0--]] == klist[[k0]]];
                  ++k0, 0]) &) /@ slist];
    pos = Position[flist, _?Positive, {1}, 1];
    pos = If[Length[pos] > 0, pos[[1, 1]] - 1, 1];
    flist = Drop[flist, {1, pos}];
    sublistTNW2[[flist]]]

Out[36]=
{{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18,
0.472525}}





Now what is the other variant? When it should exactly reproduce your
algorithm, i.e. dubletts included?

Recall

In[99]:= s
Out[99]=
{{{2, 0.726261}}, {{5, 0.343456}}, {{5, 0.343456}}, {{8, 0.681286}},
 {{16, 0.0822718}}, {{18, 0.472525}}, {{18, 0.472525}}}


This is easiest with binary search (just replace Union by Sort):

In[100]:=
as4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], 
      flist, k0, pos}, 
    flist = Sort[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= 
                    Length[klist], While[klist[[k0--]] == klist[[k0]]];
                  ++k0, 0]) &) /@ slist];
    pos = Position[flist, _?Positive, {1}, 1];
    pos = If[Length[pos] > 0, pos[[1, 1]] - 1, 1];
    flist = Drop[flist, {1, pos}];
    sublistTNW2[[flist]]]
Out[100]=
{{2, 0.726261}, {5, 0.343456}, {5, 0.343456}, {8, 0.681286}, {16, 
    0.0822718}, {18, 0.472525}, {18, 0.472525}}

In[101]:= as4 == (as = Flatten[s, 1])
Out[101]= True



With Merge/Sort/Split this is

In[103]:=

  as3 = 
  Module[{markers = Transpose[{sublistTNW1[[All,1]], 
   Table[-1, {Length[sublistTNW1]}]}], ss}, 
  ss = Split[Sort[Join[markers, sublistTNW2]], 
        Last[#1] =!= -1 & ]; 
  Reverse[
    Block[{last=Sequence[]},
      If[#[[2]]===-1,last,last=#]&/@Reverse[Rest[First/@ss]]]]      
  ];

In[104]:= as3 == as
Out[104]= True




Now a comparison for performance:
(but be aware that asymptotic complexity is different for different
algorithms below)

In[121]:= n1 = 200;
In[122]:=
sublistTNW1 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n1}]];
In[123]:= n2 = 10000;
In[124]:=
sublistTNW2 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n2}]];
 

Your algorithm on these data:

In[125]:=
s0 = Union[
        Flatten[Function[comp, 
              Select[sublistTNW2, First[#] >= First[comp] &, 1]] /@ 
            sublistTNW1, 1]]; // Timing
Out[125]=
{18.917 Second, Null}
 


Single Scan:

In[126]:=
s2 = Block[{i = 1, len = Length[sublistTNW1], this}, 
        Select[sublistTNW2, 
          If[i <= len && (this = First[#]) >= First[sublistTNW1[[i]]], 
              While[(++i <= len) && this >= First[sublistTNW1[[i]]]]; True, 
              False] &]]; // Timing
Out[126]=
{0.591 Second, Null}

In[127]:= s2 == s0
Out[127]= True



Merge/Sort/Split
 
In[128]:=
  s3 = Module[{markers = Transpose[
        {sublistTNW1[[All,1]], Table[-1, {Length[sublistTNW1]}]}],
        ss}, 
        ss = Split[Sort[Join[markers, sublistTNW2]], 
          Last[#1] =!= -1 & ]; 
        DeleteCases[Rest[First /@ ss], {_, -1}]];//Timing
Out[128]=
{0.26 Second, Null}

In[129]:= s3 == s0
Out[129]= True



Binary Search
 
In[130]:=
(s4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], 
            flist, k0, pos}, 
          flist = Union[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= 
                          Length[klist], While[klist[[k0--]] ==
klist[[k0]]];
                        ++k0, 0]) &) /@ slist];
          If[Length[flist] > 0 && flist[[1]] === 0,
            flist = Rest[flist]];
          sublistTNW2[[flist]]]); // Timing

Out[130]= {1.632 Second, Null}
In[131]:= s4 == s0
Out[131]= True


--
Hartmut Wolf


  • Prev by Date: Re: Get theoretical answer on linear equations
  • Next by Date: Univers font
  • Previous by thread: RE: speed-up of a function
  • Next by thread: 2 Simple Mathematica Questions. (regarding tensors and matrices)