MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Problem with BinLists for large lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg101293] Problem with BinLists for large lists
  • From: pfalloon <pfalloon at gmail.com>
  • Date: Tue, 30 Jun 2009 06:34:16 -0400 (EDT)

Hi,
I've been trying to use BinLists to bucket large (approx. a million
elements) lists of 3 dimensional vectors. I've noticed that BinLists
runs pretty slowly when working with large lists in general, and for
lists of this size and dimensionality actually runs out of memory.

I've worked up an alternative implementation which runs pretty
quickly, and works by partitioning component-wise, then finding
intersections of each bucket. I thought it might be worth posting to
see if anyone has encountered this issue and knows more about why it
happens...

Example implementation:

(* generate some large lists of reals *)
{x,y,z} = RandomReal[{0,1}, {3,10^6}];
xyz = Transpose[{x, y, z}];

(* define bin intervals *)
int = Range[0,1,0.1];

(* calculate buckets for each variable *)
{xBins, yBins, zBins} = Table[Flatten@Position[#, _?(int[[i]] <= # <
int[[i+1]] &)], {i, Length[int]-1}] & /@ {x, y, z};

(* get 3x3 grid of bins by taking intersections of the single-variable
bins *)
xyzBins = Table[Intersection[xBins[[i]], yBins[[j]], zBins[[k]]],
{i,Length@xBins}, {j,Length@yBins}, {k,Length@zBins}];

(* look at the length of each bins *)
Map[Length, xyzBins, {3}]

(* get the actual points in each bin *)
binContents = Map[xyz[[#]] &, xyzBins, {3}];

(* look at points from a particular bin *)
ListPlot[Transpose[xyz[[xyzBins[[1,2,3]]]]]]

(* the equivalent call to BinLists runs out of memory on my machine
(32-bit windows) *)
BinLists[xyz, {int}, {int}, {int}];


I'm not sure why BinLists doesn't use an approach like this. My
suspicion is that it may be using something like (using above
definitions):

xyzBins = Outer[Intersection, xBins, yBins, zBins];

the problem being that the Outer function will effectively cause
multiple copies of each list to be generated and thus runs out of
memory.



  • Prev by Date: Re: Points In/Out in 3D shapes
  • Next by Date: Re: publicon
  • Previous by thread: Maximize returning approximate values
  • Next by thread: Manipulate suggestions