Re: "extending" BinCounts, need for speed
- To: mathgroup at smc.vnet.net
- Subject: [mg16032] Re: [mg15871] "extending" BinCounts, need for speed
- From: "Tomas Garza" <tgarza at mail.internet.com.mx>
- Date: Sat, 20 Feb 1999 02:52:13 -0500
- Sender: owner-wri-mathgroup at wolfram.com
Olivier Pelletier wrote:
>I have a huge (i.e. Length[list]>50000) list of points (a point being a
>list {ai,bi})
>I need to process it using a BinCounts-like function doing the
>following:
>if ai is in the i-th "bin" then I should add bi to the value of this
>bin. (BinCounts takes a list of numbers and adds unity to the bin
>corresponding to the value of the number).
>Using Do[bin[[ai]]+= bi, {i,Length[my_list]}] simply takes too much
>time.
Olivier:
It depends what you call "too much time". Let me give an example: suppose I
have 50,000 points, and say 5 bins:
In[1]:=
lista1=Table[{Random[Integer,{1,10}],Random[Integer,{10,20}]},{50000}];
In[2]:=
bins={{1,2},{3,4},{5,7},{8,8},{9,10}};
Your problem can be solved as follows:
In[3]:=
Table[{j,Plus@@(#[[2]]&/@Select[lista1,bins[[j,1]]<=#[[1]]<=bins[[j,2]]&])},
{
j,1,Length[bins]}]//Timing
Out[3]=
{15.48 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}
if the bins are defined as intervals or, alternatively, if they are defined
as lists, as
In[4]:= beinis={{1,2},{3,4},{5,6,7},{8,8},{9,10}};
Table[{j,Plus@@Cases[lista1,{k_/;MemberQ[beinis[[j]],k],b_}->b]},{j,1,
Length[beinis]}]//Timing
Out[4]=
{18.57 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}
if the bins are defined as lists. A third option, using intervals, would be
In[6]:=
bainis=Interval/@bins;
Table[{j,Plus@@Cases[lista1,{k_/;IntervalMemberQ[bainis[[j]],k],b_}->b]},{j,
1,
Length[bins]}]//Timing
Out[6]=
{18.24 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}
Allowing for different machine speeds, the timing results give an idea of
the order of magnitude of time used for the calculation. I would be
interested to know if there can be a significant improvement in speed.
Of course, if your bins are of a simpler nature, defined only by the first
coordinate of each point, as you seem to imply in your message, then the
calculation becomes one order of magnitude faster:
In[7]:=
bonos=Range[10];
In[8]:=
Table[{k,Plus@@Cases[lista1,{k,b_}->b]},{k,1,Length[bonos]}]//Timing
Out[8]=
{1.7 Second,{{1,74381},{2,74696},{3,74144},{4,76656},{5,76139},{6,74722},{7,
74800},{8,73689},{9,75076},{10,75539}}}
This is about 3 times faster than using the procedural approach:
In[9]:=
tots=Table[0,{Length[bonos]}];
Do[tots[[bonos[[lista1[[k,1]]]]]]+=lista1[[k,2]],{k,1,Length[lista1]}]//Timi
ng
Out[9]=
{4.73 Second,Null}
In[10]:=
tots
Out[118]=
{74381,74696,74144,76656,76139,74722,74800,73689,75076,75539}
Good luck,
Tomas Garza
Mexico City