Re: "extending" BinCounts, need for speed
- To: mathgroup at smc.vnet.net
- Subject: [mg16002] Re: "extending" BinCounts, need for speed
- From: Paul Abbott <paul at physics.uwa.edu.au>
- Date: Fri, 19 Feb 1999 03:27:13 -0500
- Organization: University of Western Australia
- References: <7afu9f$a7k@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Olivier Pelletier wrote:
> I have a huge (i.e. Length[list]>50000) list of points (a point being a
> list {ai,bi})
> I need to process it using a BinCounts-like function doing the
> following:
> if ai is in the i-th "bin" then I should add bi to the value of this
> bin. (BinCounts takes a list of numbers and adds unity to the bin
> corresponding to the value of the number).
> Using Do[bin[[ai]]+= bi, {i,Length[my_list]}] simply takes too much
> time.
I assume from this that ai is always an integer in your application?
> Any suggestion to improve the speed.
Since I am completely sure what you need in your application, here is
one approach, using Sort and Split, that may be faster. Suppose you
have 10 bins {1,2, ..., 10}
In[1]:= numbins = 10;
Here is a random data set of 50000 points.
In[2]:= (data = Table[{Random[Integer, {1, numbins}], Random[]},
{50000}];) // Timing // First
Out[2]= 5.34 Second
If we Sort and Split this data up (binning it),
In[3]:= (binned = Split[Sort[data], First[#2] == First[#1] &];) //
Timing // First
Out[3]= 8.17 Second
then forming the sums is easy.
In[4]:= Apply[{First[#], Last[Plus[##]]} &, binned, {1}] // Timing
Out[5]= {1.39 Second, {{1, 2460.31}, {2, 2560.69}, {3, 2508.47},
{4, 2471.9}, {5, 2488.55}, {6, 2551.79}, {7, 2523.27},
{8, 2483.82}, {9, 2480.29}, {10, 2501.56}}}
Cheers,
Paul
____________________________________________________________________
Paul Abbott Phone: +61-8-9380-2734
Department of Physics Fax: +61-8-9380-1014
The University of Western Australia
Nedlands WA 6907 mailto:paul at physics.uwa.edu.au
AUSTRALIA http://www.physics.uwa.edu.au/~paul
God IS a weakly left-handed dice player
____________________________________________________________________