Re: "extending" BinCounts, need for speed

*To*: mathgroup at smc.vnet.net*Subject*: [mg16002] Re: "extending" BinCounts, need for speed*From*: Paul Abbott <paul at physics.uwa.edu.au>*Date*: Fri, 19 Feb 1999 03:27:13 -0500*Organization*: University of Western Australia*References*: <7afu9f$a7k@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Olivier Pelletier wrote: > I have a huge (i.e. Length[list]>50000) list of points (a point being a > list {ai,bi}) > I need to process it using a BinCounts-like function doing the > following: > if ai is in the i-th "bin" then I should add bi to the value of this > bin. (BinCounts takes a list of numbers and adds unity to the bin > corresponding to the value of the number). > Using Do[bin[[ai]]+= bi, {i,Length[my_list]}] simply takes too much > time. I assume from this that ai is always an integer in your application? > Any suggestion to improve the speed. Since I am completely sure what you need in your application, here is one approach, using Sort and Split, that may be faster. Suppose you have 10 bins {1,2, ..., 10} In[1]:= numbins = 10; Here is a random data set of 50000 points. In[2]:= (data = Table[{Random[Integer, {1, numbins}], Random[]}, {50000}];) // Timing // First Out[2]= 5.34 Second If we Sort and Split this data up (binning it), In[3]:= (binned = Split[Sort[data], First[#2] == First[#1] &];) // Timing // First Out[3]= 8.17 Second then forming the sums is easy. In[4]:= Apply[{First[#], Last[Plus[##]]} &, binned, {1}] // Timing Out[5]= {1.39 Second, {{1, 2460.31}, {2, 2560.69}, {3, 2508.47}, {4, 2471.9}, {5, 2488.55}, {6, 2551.79}, {7, 2523.27}, {8, 2483.82}, {9, 2480.29}, {10, 2501.56}}} Cheers, Paul ____________________________________________________________________ Paul Abbott Phone: +61-8-9380-2734 Department of Physics Fax: +61-8-9380-1014 The University of Western Australia Nedlands WA 6907 mailto:paul at physics.uwa.edu.au AUSTRALIA http://www.physics.uwa.edu.au/~paul God IS a weakly left-handed dice player ____________________________________________________________________