       Re: Fast calculation of pair correlation function

• To: mathgroup at smc.vnet.net
• Subject: [mg98636] Re: [mg98626] Fast calculation of pair correlation function
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Wed, 15 Apr 2009 04:57:55 -0400 (EDT)
• References: <200904141019.GAA07830@smc.vnet.net>

```markus wrote:
> Hi,
>
> I have a quite large list (length ~ 10,000) of coordinates (positions
> of particles from a simulation), e.g. data={{x1,y1,z1},
> {x2,y2,z2},...}, and I am looking for fast way to calculate the "pair
> correlation" or "radial distribution function", which tells me, for
> each value of r, how many particles are separated by a distance r (+-
> epsilon).
> Usually, this can be calculated by something like:
>
> Do[Do[histogr[[ Ceiling[Norm[ data[[i]]-data[[j]] ] / delta] ]], {j, i
> +1, Length[data]}], {i, 1, Length[data]-1}]
>
> where histogr is just a list whose index corresponds to the distance
> r.
>
> Unfortunately I have found that for lists of length >1,000, this way
> of calculating is very slow (it can take ~minutes), compared to the
> equivalent C code, which proceeds in a few seconds. I have also tried
> to "Compile" the function, but the speed does not increase...
>
> Does anybody know some fast way of calculating the pair distribution
> function in Mathematica?

I am a bit unclear as to what you ultimately want. maybe a histogram,
say, with respect to some fixed epsilon value? Or maybe you need to
evaluate the RDF for various different {r,eps} pairs?

For lists up to 5K or so in size (larger, on 64 bit machines), it would
make sense to preprocess to get the full list of (sorted) distances. The
code below seems to be reasonably efficient for finding all distances
between pairs and sorting.

distancesC = Compile[{{ll,_Real,2}},
Sort[Flatten[Table[Sqrt[#.#]&[(ll[[j]]-ll[[k]])],
{j,Length[ll]}, {k,Length[ll]}]]]]

Example on a set of 3000 points.

dim = 3;
coords = RandomReal[{-20,20}, {3000,dim}];

In:= Timing[norms = distancesC[coords];]
Out= {11.8582, Null}

For the situation where you may be interested in many distances and
different epsiolons, I'd recommend a divide-and-conquer search. The code
below will do this. Note we divide by two because we used each point
pair twice. Alternatives would be to use Union in distancesC but this
can cause trouble if you allow multiple "points" at the same coordinates
(I'm guessing you do not, in which case modify the code as appropriate).

{{dlist,_Real,1}, {r,_Real}, {eps,_Real}},
Catch[Module[
{len=Length[dlist],bot=r-eps,top=r+eps,lo=1,mid,hi},
hi = len;
If [dlist[[lo]]>top || dlist[[hi]]<bot, Throw];
mid = Ceiling[(hi+lo)/2.];
While[dlist[[lo]]<bot && mid>lo,
If [dlist[[mid]]<bot,
lo = mid; mid = Ceiling[(hi+lo)/2.],
mid = Floor[(mid+lo)/2.];
]];
mid = Ceiling[(hi+lo)/2.];
While[dlist[[hi]]>top && mid<hi,
If [dlist[[mid]]>top,
hi = mid; mid = Floor[(hi+lo)/2.],
mid = Ceiling[(mid+hi)/2.];
]];
Round[(hi-lo+1)/2]
]]];

Now we'll find out how many pairs are separated by distance 11.2 +- 0.05.

Out= {0., 6861}

This might be fast enough for your purposes but you will need
considerable memory to get it to work for sets of 10^4 points. For that
you might consider splitting into sublists, doing computations like
those above on sublist pairs, and combining results. Alternatively,
there might be approaches that use Nearest or other functionality that
avoids creating an nxn matrix from the n points.

I did try this on a 64 bit machine, with a list of 10^4 points. It took
around 134 seconds to compute the distance pairs, and a single query was
as below: