Re: "extending" BinCounts, need for speed

*To*: mathgroup at smc.vnet.net*Subject*: [mg16032] Re: [mg15871] "extending" BinCounts, need for speed*From*: "Tomas Garza" <tgarza at mail.internet.com.mx>*Date*: Sat, 20 Feb 1999 02:52:13 -0500*Sender*: owner-wri-mathgroup at wolfram.com

Olivier Pelletier wrote: >I have a huge (i.e. Length[list]>50000) list of points (a point being a >list {ai,bi}) >I need to process it using a BinCounts-like function doing the >following: >if ai is in the i-th "bin" then I should add bi to the value of this >bin. (BinCounts takes a list of numbers and adds unity to the bin >corresponding to the value of the number). >Using Do[bin[[ai]]+= bi, {i,Length[my_list]}] simply takes too much >time. Olivier: It depends what you call "too much time". Let me give an example: suppose I have 50,000 points, and say 5 bins: In[1]:= lista1=Table[{Random[Integer,{1,10}],Random[Integer,{10,20}]},{50000}]; In[2]:= bins={{1,2},{3,4},{5,7},{8,8},{9,10}}; Your problem can be solved as follows: In[3]:= Table[{j,Plus@@(#[[2]]&/@Select[lista1,bins[[j,1]]<=#[[1]]<=bins[[j,2]]&])}, { j,1,Length[bins]}]//Timing Out[3]= {15.48 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}} if the bins are defined as intervals or, alternatively, if they are defined as lists, as In[4]:= beinis={{1,2},{3,4},{5,6,7},{8,8},{9,10}}; Table[{j,Plus@@Cases[lista1,{k_/;MemberQ[beinis[[j]],k],b_}->b]},{j,1, Length[beinis]}]//Timing Out[4]= {18.57 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}} if the bins are defined as lists. A third option, using intervals, would be In[6]:= bainis=Interval/@bins; Table[{j,Plus@@Cases[lista1,{k_/;IntervalMemberQ[bainis[[j]],k],b_}->b]},{j, 1, Length[bins]}]//Timing Out[6]= {18.24 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}} Allowing for different machine speeds, the timing results give an idea of the order of magnitude of time used for the calculation. I would be interested to know if there can be a significant improvement in speed. Of course, if your bins are of a simpler nature, defined only by the first coordinate of each point, as you seem to imply in your message, then the calculation becomes one order of magnitude faster: In[7]:= bonos=Range[10]; In[8]:= Table[{k,Plus@@Cases[lista1,{k,b_}->b]},{k,1,Length[bonos]}]//Timing Out[8]= {1.7 Second,{{1,74381},{2,74696},{3,74144},{4,76656},{5,76139},{6,74722},{7, 74800},{8,73689},{9,75076},{10,75539}}} This is about 3 times faster than using the procedural approach: In[9]:= tots=Table[0,{Length[bonos]}]; Do[tots[[bonos[[lista1[[k,1]]]]]]+=lista1[[k,2]],{k,1,Length[lista1]}]//Timi ng Out[9]= {4.73 Second,Null} In[10]:= tots Out[118]= {74381,74696,74144,76656,76139,74722,74800,73689,75076,75539} Good luck, Tomas Garza Mexico City