Problem with BinLists for large lists
- To: mathgroup at smc.vnet.net
- Subject: [mg101293] Problem with BinLists for large lists
- From: pfalloon <pfalloon at gmail.com>
- Date: Tue, 30 Jun 2009 06:34:16 -0400 (EDT)
Hi, I've been trying to use BinLists to bucket large (approx. a million elements) lists of 3 dimensional vectors. I've noticed that BinLists runs pretty slowly when working with large lists in general, and for lists of this size and dimensionality actually runs out of memory. I've worked up an alternative implementation which runs pretty quickly, and works by partitioning component-wise, then finding intersections of each bucket. I thought it might be worth posting to see if anyone has encountered this issue and knows more about why it happens... Example implementation: (* generate some large lists of reals *) {x,y,z} = RandomReal[{0,1}, {3,10^6}]; xyz = Transpose[{x, y, z}]; (* define bin intervals *) int = Range[0,1,0.1]; (* calculate buckets for each variable *) {xBins, yBins, zBins} = Table[Flatten@Position[#, _?(int[[i]] <= # < int[[i+1]] &)], {i, Length[int]-1}] & /@ {x, y, z}; (* get 3x3 grid of bins by taking intersections of the single-variable bins *) xyzBins = Table[Intersection[xBins[[i]], yBins[[j]], zBins[[k]]], {i,Length@xBins}, {j,Length@yBins}, {k,Length@zBins}]; (* look at the length of each bins *) Map[Length, xyzBins, {3}] (* get the actual points in each bin *) binContents = Map[xyz[[#]] &, xyzBins, {3}]; (* look at points from a particular bin *) ListPlot[Transpose[xyz[[xyzBins[[1,2,3]]]]]] (* the equivalent call to BinLists runs out of memory on my machine (32-bit windows) *) BinLists[xyz, {int}, {int}, {int}]; I'm not sure why BinLists doesn't use an approach like this. My suspicion is that it may be using something like (using above definitions): xyzBins = Outer[Intersection, xBins, yBins, zBins]; the problem being that the Outer function will effectively cause multiple copies of each list to be generated and thus runs out of memory.