RE: RE: speed-up of a function
- To: mathgroup at smc.vnet.net
- Subject: [mg45165] RE: [mg45117] RE: [mg45072] speed-up of a function
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 19 Dec 2003 06:57:31 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message----- >From: Wolf, Hartmut To: mathgroup at smc.vnet.net >Sent: Wednesday, December 17, 2003 1:55 PM >To: mathgroup at smc.vnet.net >Subject: [mg45165] [mg45117] RE: [mg45072] speed-up of a function > > > >>-----Original Message----- >>From: Kastens at Hamburg.BAW.DE [mailto:Kastens at Hamburg.BAW.DE] To: mathgroup at smc.vnet.net >To: mathgroup at smc.vnet.net >>Sent: Tuesday, December 16, 2003 12:21 PM >>To: mathgroup at smc.vnet.net >>Subject: [mg45165] [mg45117] [mg45072] speed-up of a function >> >> >>Hi! >> >>I have something like this: (two lists(sublistTNW1 and >>sublistTNW2)), containing both time (Integer) and a value (Real)). >>The following function pick-up the next dataset(time,value) >>out of sublistTNW2 after the given time (sublistTNW1[[i,1]]). >> >>NextEvent[t_, list_] := Select[list, (#[[1]] >= t) &, 1] >>For[i = 1, i < 500, >>{ >> NextEvent[sublistTNW1[[i, 1]], sublistTNW2] >>}; i++] >> >>Calling the function in the for-loop very often (f.ex 500 >>times or more) is indeed very slowly. >> >>Trying to compile the function >>testfunc = >> Compile[{{t, _Integer}, {list, _Real, 2}}, Select[list, >>(#[[1]] >= t) &, 1]] >> >>takes no effect in speed-up. Perhabs I've used the >>compile-Function not correctly? >> >>How can I use lists with different data types in compile? >>{list, _Real, 2} (s.a.) is not exactly right, because the list >>is in the format {_Integer,_Real}. >> >>Thanks for any suggestions, >>marko >> > > >Marko, > >I make a model of your computation, hope it applies. > >I don't thing your computation make stoo much sense unless >sublistTNW2 is sorted. I also suppose sublistTNW1 is sorted. >If not, try to reformulate your problem. > >Now the model: > >In[1]:= n0 = 100000; >In[2]:= >sublistTNW2 = Sort[Table[{Random[Integer, {1, n0}], Random[]}, {n0}]]; >In[3]:= >sublistTNW1 = Sort[Table[Random[Integer, {1, n0}], {500}]]; > >In[4]:= NextEvent[t_, list_] := Select[list, (#[[1]] >= t) &, 1] > > >In[6]:= r = {}; For[i = 1, i <= 15, > AppendTo[r, NextEvent[sublistTNW1[[i]], sublistTNW2]]; i++]; r = > Flatten[r, 1] >Out[6]= >{{210, 0.0739654}, {290, 0.571214}, {294, 0.192693}, > {304, 0.718815}, {701, 0.202688}, {1216, 0.140055}, > {1276, 0.335399}, {1454, 0.593851}, {1472, 0.6723}, > {1604, 0.271997}, {1905, 0.962603}, {1922, 0.74373}, > {2009, 0.217664}, {2125, 0.584308}, {2584, 0.476056}} > >In[7]:= Length[r] >Out[7]= 15 > >This is just to check for correctness for an alternative algorithm. > > >Now we test your's: > >In[8]:= For[i = 1, i <= 50, NextEvent[sublistTNW1[[i]], >sublistTNW2]; i++] >// Timing >Out[8]= {4.587 Second, Null} > >In[9]:= For[i = 1, i <= 100, NextEvent[sublistTNW1[[i]], >sublistTNW2]; i++] >// Timing >Out[9]= {20.5 Second, Null} > >In[10]:= For[i = 1, i <= 150, NextEvent[sublistTNW1[[i]], >sublistTNW2]; i++] >// Timing >Out[10]= {47.608 Second, Null} > >You see: you algorithm is O[n^2], which is prohibitive in this case. > > >The reason is clear: for (sorted sublists) we start always at >the beginning, >such we get longer and longer scans. > > > >This simple trick continues scanning at the last hit: > >In[11]:= i = 1; Select[sublistTNW2, > If[#[[1]] >= sublistTNW1[[i]], ++i; True, False] &, 15] >Out[11]= >{{210, 0.0739654}, {290, 0.571214}, {294, 0.192693}, > {304, 0.718815}, {701, 0.202688}, {1216, 0.140055}, > {1276, 0.335399}, {1454, 0.593851}, {1472, 0.6723}, > {1604, 0.271997}, {1905, 0.962603}, {1922, 0.74373}, > {2009, 0.217664}, {2125, 0.584308}, {2584, 0.476056}} > >In[12]:= % == r >Out[12]= True > >Same result as for the reference algorithm. > >Now test: > >In[17]:= >i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; > True, False] &, 50]; // Timing >Out[17]= {0.38 Second, Null} > >In[18]:= >i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; > True, False] &, 100]; // Timing >Out[18]= {0.811 Second, Null} > >In[19]:= >i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; > True, False] &, 150]; // Timing >Out[19]= {1.142 Second, Null} > >In[20]:= >i = 1; Select[sublistTNW2, If[#[[1]] >= sublistTNW1[[i]], ++i; > True, False] &, 200]; // Timing >Out[20]= {1.512 Second, Null} > > >So the algorithm is linear (and scans sublistTNW2 only once, instead of >repeating at beginning of sublistTNW2 over and over). > > >Whether this algorithms is really appropriate depends on more >properties of your problem, you didn't report. Just try it! > > >Other strategies were to to binay search (sorted) sublistTNW2, >or to splice in sublistTNW1 as markers, Sort and Split. Perhaps others... > > >-- >Hartmut Wolf > > Marko, here now my promised explanations and comparisons: data: In[1]:= n1 = 7; In[2]:= sublistTNW1 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n1}]] Out[2]= {{2, 0.285281}, {4, 0.228988}, {5, 0.0569614}, {8, 0.387688}, {16, 0.982228}, {18, 0.253338}, {18, 0.40502}} In[3]:= n2 = 20; In[4]:= sublistTNW2 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n2}]] Out[4]= {{1, 0.539148}, {2, 0.726261}, {3, 0.827974}, {5, 0.343456}, {5, 0.422791}, {6, 0.0319436}, {6, 0.453189}, {8, 0.681286}, {10, 0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 0.92887}, {14, 0.21612}, {16, 0.0822718}, {16, 0.719935}, {17, 0.231015}, {17, 0.322158}, {18, 0.472525}, {19, 0.754313}, {20, 0.831389}} In[11]:= i = 1; s = Select[sublistTNW2, If[First[#] >= First[sublistTNW1[[i]]], ++i; True, False] &, Length[sublistTNW1]] Out[11]= {{2, 0.726261}, {5, 0.343456}, {5, 0.422791}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}, {19, 0.754313}} Compare that result with In[8]:= sublistTNW1 Out[8]= {{2, 0.285281}, {4, 0.228988}, {5, 0.0569614}, {8, 0.387688}, {16, 0.982228}, {18, 0.253338}, {18, 0.40502}} you see the weekness of the idea, perhaps the third hit should be identical to the second (and sixth to seventh). The answer, whther ok or not, however, must come from your application. Your algorithm gives a different result here: In[9]:= s = Function[comp, Select[sublistTNW2, First[#] >= First[comp] &, 1]] /@ sublistTNW1 Out[9]= {{{2, 0.726261}}, {{5, 0.343456}}, {{5, 0.343456}}, {{8, 0.681286}}, {{16, 0.0822718}}, {{18, 0.472525}}, {{18, 0.472525}}} Now it would not be too easy to fix my (not quite so brilliant) idea. On the other side, perhaps, dubletts should not show up in the result. Such if you wanted In[10]:= s0 = Union[Flatten[s, 1]] Out[10]= {{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}} a fix would be much easier. In[12]:= i = 1; len = Length[sublistTNW1]; s2 = Select[sublistTNW2, If[i <= len && (this = First[#]) >= First[sublistTNW1[[i]]], While[(++i <= len) && this >= First[sublistTNW1[[i]]]]; True, False] &] Out[12]= {{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}} When we get a hit, we just skip all other search keys, which would give the same hit (in your algorithm) Merge/Sort/Split approach: This is another idea I had indicated in my first post. We merge both lists sorted, say sublistTNW1 would be all red, then in the result after each red element, the next black one would be the corresponding hit. Split just is a fast way to locate the markers. As we do not have colors, we tag sublistTNW1 In[21]:= markers = Transpose[ {sublistTNW1[[All, 1]], Table[-1, {Length[sublistTNW1]}]}] Out[21]= {{2, -1}, {4, -1}, {5, -1}, {8, -1}, {16, -1}, {18, -1}, {18, -1}} In[22]:= Sort[Join[markers, sublistTNW2]] Out[22]= {{1, 0.539148}, {2, -1}, {2, 0.726261}, {3, 0.827974}, {4, -1}, {5, -1}, {5, 0.343456}, {5, 0.422791}, {6, 0.0319436}, {6, 0.453189}, {8, -1}, {8, 0.681286}, {10, 0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 0.92887}, {14, 0.21612}, {16, -1}, {16, 0.0822718}, {16, 0.719935}, {17, 0.231015}, {17, 0.322158}, {18, -1}, {18, -1}, {18, 0.472525}, {19, 0.754313}, {20, 0.831389}} In[23]:= Split[%, Last[#1] =!= -1 &] Out[23]= {{{1, 0.539148}, {2, -1}}, {{2, 0.726261}, {3, 0.827974}, {4, -1}}, {{5, -1}}, {{5, 0.343456}, {5, 0.422791}, {6, 0.0319436}, {6, 0.453189}, {8, -1}}, {{8, 0.681286}, {10, 0.467884}, {11, 0.578041}, {13, 0.704192}, {13, 0.92887}, {14, 0.21612}, {16, -1}}, {{16, 0.0822718}, {16, 0.719935}, {17, 0.231015}, {17, 0.322158}, {18, -1}}, {{18, -1}}, {{18, 0.472525}, {19, 0.754313}, {20, 0.831389}}} In[24]:= Rest[First /@ %] Out[24]= {{2, 0.726261}, {5, -1}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, -1}, {18, 0.472525}} Markers that still show up here indicate dubletts. We might drop them if not desired. In[25]:= s3 = DeleteCases[%, {_, -1}] Out[25]= {{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}} In[26]:= s0 Out[26]= {{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}} Binary Search: In[28]:= << DiscreteMath`Combinatorica` In[29]:= ?BinarySearch In[31]:= klist = sublistTNW2[[All, 1]] Out[31]= {1, 2, 3, 5, 5, 6, 6, 8, 10, 11, 13, 13, 14, 16, 16, 17, 17, 18, 19, 20} The keys in the list to be searched In[32]:= slist = sublistTNW1[[All, 1]] Out[32]= {2, 4, 5, 8, 16, 18, 18} The list of the search keys. In[33]:= BinarySearch[klist, #] & /@ slist Out[33]= {2, 7/2, 5, 8, 15, 18, 18} Direct hits are given by integer indices, otherwise (for us) the interesting element would be given by the next integer In[34]:= Ceiling[%] Out[34]= {2, 4, 5, 8, 15, 18, 18} We see one of the two dublett pairs (the index 18 repeats). But where has the other gone? In[35]:= sublistTNW2[[Union[%]]] Out[35]= {{2, 0.726261}, {5, 0.343456}, {5, 0.422791}, {8, 0.681286}, {16, 0.719935}, {18, 0.472525}} Here it comes up again! The reason is, binary search stops when it gets a hit. If we want the lowest of all possible hits, we have to do some corrective work: There also is another point, if we search for a key that is too large, we get an index out of bounds. In[36]:= s4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], flist, k0, pos}, flist = Union[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= Length[klist], While[klist[[k0--]] == klist[[k0]]]; ++k0, 0]) &) /@ slist]; pos = Position[flist, _?Positive, {1}, 1]; pos = If[Length[pos] > 0, pos[[1, 1]] - 1, 1]; flist = Drop[flist, {1, pos}]; sublistTNW2[[flist]]] Out[36]= {{2, 0.726261}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}} Now what is the other variant? When it should exactly reproduce your algorithm, i.e. dubletts included? Recall In[99]:= s Out[99]= {{{2, 0.726261}}, {{5, 0.343456}}, {{5, 0.343456}}, {{8, 0.681286}}, {{16, 0.0822718}}, {{18, 0.472525}}, {{18, 0.472525}}} This is easiest with binary search (just replace Union by Sort): In[100]:= as4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], flist, k0, pos}, flist = Sort[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= Length[klist], While[klist[[k0--]] == klist[[k0]]]; ++k0, 0]) &) /@ slist]; pos = Position[flist, _?Positive, {1}, 1]; pos = If[Length[pos] > 0, pos[[1, 1]] - 1, 1]; flist = Drop[flist, {1, pos}]; sublistTNW2[[flist]]] Out[100]= {{2, 0.726261}, {5, 0.343456}, {5, 0.343456}, {8, 0.681286}, {16, 0.0822718}, {18, 0.472525}, {18, 0.472525}} In[101]:= as4 == (as = Flatten[s, 1]) Out[101]= True With Merge/Sort/Split this is In[103]:= as3 = Module[{markers = Transpose[{sublistTNW1[[All,1]], Table[-1, {Length[sublistTNW1]}]}], ss}, ss = Split[Sort[Join[markers, sublistTNW2]], Last[#1] =!= -1 & ]; Reverse[ Block[{last=Sequence[]}, If[#[[2]]===-1,last,last=#]&/@Reverse[Rest[First/@ss]]]] ]; In[104]:= as3 == as Out[104]= True Now a comparison for performance: (but be aware that asymptotic complexity is different for different algorithms below) In[121]:= n1 = 200; In[122]:= sublistTNW1 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n1}]]; In[123]:= n2 = 10000; In[124]:= sublistTNW2 = Sort[Table[{Random[Integer, {0, n0}], Random[]}, {n2}]]; Your algorithm on these data: In[125]:= s0 = Union[ Flatten[Function[comp, Select[sublistTNW2, First[#] >= First[comp] &, 1]] /@ sublistTNW1, 1]]; // Timing Out[125]= {18.917 Second, Null} Single Scan: In[126]:= s2 = Block[{i = 1, len = Length[sublistTNW1], this}, Select[sublistTNW2, If[i <= len && (this = First[#]) >= First[sublistTNW1[[i]]], While[(++i <= len) && this >= First[sublistTNW1[[i]]]]; True, False] &]]; // Timing Out[126]= {0.591 Second, Null} In[127]:= s2 == s0 Out[127]= True Merge/Sort/Split In[128]:= s3 = Module[{markers = Transpose[ {sublistTNW1[[All,1]], Table[-1, {Length[sublistTNW1]}]}], ss}, ss = Split[Sort[Join[markers, sublistTNW2]], Last[#1] =!= -1 & ]; DeleteCases[Rest[First /@ ss], {_, -1}]];//Timing Out[128]= {0.26 Second, Null} In[129]:= s3 == s0 Out[129]= True Binary Search In[130]:= (s4 = Module[{klist = sublistTNW2[[All, 1]], slist = sublistTNW1[[All, 1]], flist, k0, pos}, flist = Union[((If[(k0 = Ceiling[BinarySearch[klist, #1]]) <= Length[klist], While[klist[[k0--]] == klist[[k0]]]; ++k0, 0]) &) /@ slist]; If[Length[flist] > 0 && flist[[1]] === 0, flist = Rest[flist]]; sublistTNW2[[flist]]]); // Timing Out[130]= {1.632 Second, Null} In[131]:= s4 == s0 Out[131]= True -- Hartmut Wolf