MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Problem with BinLists for large lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg101307] Re: [mg101293] Problem with BinLists for large lists
  • From: Darren Glosemeyer <darreng at wolfram.com>
  • Date: Wed, 1 Jul 2009 06:32:03 -0400 (EDT)
  • References: <200906301034.GAA10625@smc.vnet.net>

pfalloon wrote:
> Hi,
> I've been trying to use BinLists to bucket large (approx. a million
> elements) lists of 3 dimensional vectors. I've noticed that BinLists
> runs pretty slowly when working with large lists in general, and for
> lists of this size and dimensionality actually runs out of memory.
>
> I've worked up an alternative implementation which runs pretty
> quickly, and works by partitioning component-wise, then finding
> intersections of each bucket. I thought it might be worth posting to
> see if anyone has encountered this issue and knows more about why it
> happens...
>
> Example implementation:
>
> (* generate some large lists of reals *)
> {x,y,z} = RandomReal[{0,1}, {3,10^6}];
> xyz = Transpose[{x, y, z}];
>
> (* define bin intervals *)
> int = Range[0,1,0.1];
>
> (* calculate buckets for each variable *)
> {xBins, yBins, zBins} = Table[Flatten@Position[#, _?(int[[i]] <= # <
> int[[i+1]] &)], {i, Length[int]-1}] & /@ {x, y, z};
>
> (* get 3x3 grid of bins by taking intersections of the single-variable
> bins *)
> xyzBins = Table[Intersection[xBins[[i]], yBins[[j]], zBins[[k]]],
> {i,Length@xBins}, {j,Length@yBins}, {k,Length@zBins}];
>
> (* look at the length of each bins *)
> Map[Length, xyzBins, {3}]
>
> (* get the actual points in each bin *)
> binContents = Map[xyz[[#]] &, xyzBins, {3}];
>
> (* look at points from a particular bin *)
> ListPlot[Transpose[xyz[[xyzBins[[1,2,3]]]]]]
>
> (* the equivalent call to BinLists runs out of memory on my machine
> (32-bit windows) *)
> BinLists[xyz, {int}, {int}, {int}];
>
>
> I'm not sure why BinLists doesn't use an approach like this. My
> suspicion is that it may be using something like (using above
> definitions):
>
> xyzBins = Outer[Intersection, xBins, yBins, zBins];
>
> the problem being that the Outer function will effectively cause
> multiple copies of each list to be generated and thus runs out of
> memory.
>
>   

Thanks for the suggestion. We will need to take a closer look at this 
for a future version.

As an important side note, when bins have regular spacing, it will 
always be faster to use the {min,max,dx} spec than the {{c1,c2,...}} 
cutoff spec because the dx spec can take advantage of the knowledge that 
the bins are regularly spaced. In this particular example,

BinLists[xyz, {0, 1, 0.1}, {0, 1, 0.1}, {0, 1, 0.1}]

is about 11 times faster than the suggested code.

Darren Glosemeyer
Wolfram Research


  • Prev by Date: Re: stirring chocolate pudding backwards: identifying coordinates
  • Next by Date: Re: Problem with BinLists for large lists
  • Previous by thread: Re: stirring chocolate pudding backwards: identifying coordinates
  • Next by thread: Re: Problem with BinLists for large lists