MathGroup Archive 1999

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: "extending" BinCounts, need for speed

  • To: mathgroup at smc.vnet.net
  • Subject: [mg16032] Re: [mg15871] "extending" BinCounts, need for speed
  • From: "Tomas Garza" <tgarza at mail.internet.com.mx>
  • Date: Sat, 20 Feb 1999 02:52:13 -0500
  • Sender: owner-wri-mathgroup at wolfram.com

Olivier Pelletier wrote:

>I have a huge (i.e. Length[list]>50000) list of points (a point being a
>list {ai,bi})
>I need to process it using a BinCounts-like function doing the
>following:
>if ai is in the i-th "bin" then I should add bi to the value of this
>bin. (BinCounts  takes a list of numbers and adds unity to the bin
>corresponding to the value of the number).
>Using Do[bin[[ai]]+= bi, {i,Length[my_list]}] simply takes too much
>time.

Olivier:

It depends what you call "too much time". Let me give an example: suppose I
have 50,000 points, and say 5 bins:

In[1]:=
lista1=Table[{Random[Integer,{1,10}],Random[Integer,{10,20}]},{50000}];

In[2]:=
bins={{1,2},{3,4},{5,7},{8,8},{9,10}};

Your problem can be solved as follows:

In[3]:=
Table[{j,Plus@@(#[[2]]&/@Select[lista1,bins[[j,1]]<=#[[1]]<=bins[[j,2]]&])},
{
      j,1,Length[bins]}]//Timing
Out[3]=
{15.48 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}

if the bins are defined as intervals or, alternatively, if they are defined
as lists, as

In[4]:= beinis={{1,2},{3,4},{5,6,7},{8,8},{9,10}};
Table[{j,Plus@@Cases[lista1,{k_/;MemberQ[beinis[[j]],k],b_}->b]},{j,1,
      Length[beinis]}]//Timing
Out[4]=
{18.57 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}

if the bins are defined as lists. A third option, using intervals, would be

In[6]:=
bainis=Interval/@bins;

Table[{j,Plus@@Cases[lista1,{k_/;IntervalMemberQ[bainis[[j]],k],b_}->b]},{j,
1,
      Length[bins]}]//Timing
Out[6]=
{18.24 Second,{{1,149077},{2,150800},{3,225661},{4,73689},{5,150615}}}

Allowing for different machine speeds, the timing results give an idea of
the order of magnitude of time used for the calculation. I would be
interested to know if there can be a significant improvement in speed.

Of course, if your bins are of a simpler nature, defined only by the first
coordinate of each point, as you seem to imply in your message, then the
calculation becomes one order of magnitude faster:

In[7]:=
bonos=Range[10];

In[8]:=
Table[{k,Plus@@Cases[lista1,{k,b_}->b]},{k,1,Length[bonos]}]//Timing
Out[8]=
{1.7 Second,{{1,74381},{2,74696},{3,74144},{4,76656},{5,76139},{6,74722},{7,
      74800},{8,73689},{9,75076},{10,75539}}}

This is about 3 times faster than using the procedural approach:

In[9]:=
tots=Table[0,{Length[bonos]}];
Do[tots[[bonos[[lista1[[k,1]]]]]]+=lista1[[k,2]],{k,1,Length[lista1]}]//Timi
ng

Out[9]=
{4.73 Second,Null}

In[10]:=
tots
Out[118]=
{74381,74696,74144,76656,76139,74722,74800,73689,75076,75539}

Good luck,

Tomas Garza
Mexico City



  • Prev by Date: fonts and mathematica/publicon
  • Next by Date: Re: Strange result of Limit function in Mathematica3.0
  • Previous by thread: Re: "extending" BinCounts, need for speed
  • Next by thread: Win 98 Printing Problems