Re: Speed up calculating the pair correlation function
- To: mathgroup at smc.vnet.net
- Subject: [mg103367] Re: [mg103335] Speed up calculating the pair correlation function
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Thu, 17 Sep 2009 06:22:17 -0400 (EDT)
- References: <200909160946.FAA12977@smc.vnet.net>
Szabolcs Horvát wrote: > Hello, > > I would like to calculate the pair correlation function normalized to 1 > for some 2D point data. I.e. I need to find the mean density of points > at distance r from any point, normalized to 1. > > I am looking for advice on speeding this up. > > This is the current implementation I have: > > The pcfOne function calculates the mean density of 'points' at distance > r from one single point ('origin'), up to 'rmax' in steps of 'dr'. > 'density' is the average density of all points over the complete region > (since the shape of the region is unknown to the function, this quantity > is passed separately): > > pcfOne[points_, origin_, dr_, rmax_, density_] := > Module[{hist}, > hist = BinCounts[ > With[{v = # - origin}, Sqrt[v.v]] & /@ points, > {0, rmax, dr}]; > Transpose[ > {Range[0, rmax - dr, dr] + dr/2, > hist/(Pi (dr^2 + 2 dr Range[0, rmax - dr, dr]) density)} > ] > ] > > Now we can select a subset of the points, calculate this function for > all of them and average the results. For randomly distributed points > the result will be a constant function of value 1 (at least until we get > too close to the edge of the region): > > data = RandomReal[1, {50000, 2}]; > > ListPlot[ > Mean[ > pcfOne[data, #, 0.05, 0.5, Length[data]] & /@ > Nearest[data, {.5, .5}, 1000] > ], > > PlotRange -> {0, 2}, Axes -> False, Frame -> True > ] > > This runs in 80 seconds on my machine. I would like to use this > function on datasets of up to 300,000 points and average over more than > just 1000 points near the middle, say 10000. That would take 60 times > as long, ~80 minutes, which is way too much. > > Is it possible to speed this up significantly? If you can settle for an approximation, under some conditions the method below can be faster. (1) Consider your domain as a grid. Place points into a set of discrete boxes. Keep track of how many are in each box. (2) For each origin, iterate over boxes, find distance from box center to origin, add box's point count to appropriate bin. The idea is that with low dimension (e.g. 2 or perhaps 3), and sufficiently many points, and not too fine a binning delta, this will both give a reasonable approximation and be faster to compute. Code can be done as below, though there may be still faster approaches. boxCountsC = Compile[{{pts,_Real,2}, {gran,_Real,1}, {xrng,_Real,1}, {yrng,_Real,1}}, Module[ {res, pt, x, y}, res = Table[0, {j,xrng[[1]],xrng[[2]]-gran[[1]],gran[[1]]}, {k,yrng[[1]],yrng[[2]]-gran[[2]],gran[[2]]}]; Do [pt = pts[[j]]; x = Ceiling[(pt[[1]]-xrng[[1]])/gran]; y = Ceiling[(pt[[2]]-yrng[[1]])/gran]; res[[x,y]]++ , {j,Length[pts]}]; res ]]; pcfOne2C = Compile[{{bins,_Integer,2}, {orig,_Real,1}, {dr,_Real}, {rmax,_Real}, {density,_Real}, {gran,_Real,1}, {xrng,_Real,1}, {yrng,_Real,1}}, Module[ {res, rng, d}, res = Table[0.,{Ceiling[rmax/dr]}]; Do [ d = Ceiling[(Norm[orig - {xrng[[1]]+(j-.5)*gran[[1]],yrng[[1]]+(k-.5)*gran[[2]]}])/dr]; If [d<=Length[res], res[[d]] += bins[[j,k]]]; , {j,Length[bins]}, {k,Length[bins[[j]]]} ]; rng = Range[0, rmax-dr, dr]; Transpose[{rng + dr/2., res/(3.14*(dr^2 + 2*dr*rng)*density)}] ]]; Let's try your example. I'll use boxes of length .01 for the gridding. data = RandomReal[1, {50000, 2}]; xrng = {0.,1.}; yrng = {0.,1.}; gran = {.01,.01}; Your code: Timing[res2 = Mean[ pcfOne[data, #, 0.05, 0.5, Length[data]] & /@ Nearest[data, {.5, .5}, 1000]];] Out[126]= {23.4904, Null} Above version: Timing[bc = boxCountsC[data, gran, xrng, yrng];] Out[128]= {0.122981, Null} nlist = Nearest[data, {.5, .5}, 1000]; Timing[res = Mean[pcfOne2C[bc, #, 0.05, 0.5, N[Length[data]], gran, xrng, yrng] & /@ nlist];] Out[130]= {7.30289, Null} The differences are modest. In[131]:= res-res2 Out[131]= {{0., 0.000729011}, {0., -0.000365749}, {0., 0.000407047}, {0., 0.00064568}, {0., 0.0013471}, {0., 0.000550493}, {0., 0.000781529}, {0., 0.000437688}, {0., 0.000663624}, {0., 0.00048734}} If I use ten times as many points, the initial gridding takes ten times longer, but the pcfOne2C iteration is just as fast as before (it is affected by the granularity, not the number of points). With many origins, it might make sense to grid those as well. Daniel Lichtblau Wolfram Research
- References:
- Speed up calculating the pair correlation function for 2D point data
- From: Szabolcs Horvát <szhorvat@gmail.com>
- Speed up calculating the pair correlation function for 2D point data