       Re: sorting a nested list of 4-tuples

• To: mathgroup at smc.vnet.net
• Subject: [mg120382] Re: sorting a nested list of 4-tuples
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Thu, 21 Jul 2011 05:45:11 -0400 (EDT)
• References: <201107191055.GAA10229@smc.vnet.net>

On 07/19/2011 05:55 AM, Luis Valero wrote:
> Dear Sirs,
>
> I want to sort a list of 4-tuples of real numbers, so that, the second 4-tuple has minimum distance to the first, the third, selected from the rest, ha minimum distance to the second, and so on.
>
> The distance is the euclidean distance, calculated with the first two elements of each 4-tuple.
>
> I have defined a function that work:
>
> orderedList[list_] := Module[{nearest},
> 	Flatten[ Nest[{
>     	    Append[ #[] , nearest = Flatten @@ Nearest[ Map[ Rule[ #[[{1, 2}]], #]&, #[] ], Last[ #[] ] [[{1, 2}]], 1] ],
>      	    Delete[ #[], Position[ #[], nearest ] ]  }&, { {list[] }, Delete[ list, 1 ] }, Length[ list ] - 2], 1] ];
>
>
> but I need to improve the temporal efficiency
>
> Thank you

I wanted to improve on my last proposed method and also say a bit about
the computational complexity.

First I'll note that a straightforward double iteration can be used. If
coupled with Compile it should be reasonably fast, albeit still O(n^2)
for a list of n elements. Also it will scale nicely with dimension in
cases where we are not restricting the distance measure to the first two
components of the tuples.

Here is code for this.

findClosestUniqueC = Compile[{{ll, _Real, 2}},
Module[{n = Length[ll], res = ll, posn, max, min, diff, dist, last,
tmp},
max = (4.*Max[ll[[All, 1 ;; 2]]])^2;
Do[
last = res[[j - 1, 1 ;; 2]];
posn = 0;
min = max;
Do[
diff = res[[k, 1 ;; 2]] - last;
dist = diff.diff;
If[dist < min, min = dist; posn = k];
, {k, j, n}];
res[[{j, posn}]] = res[[{posn, j}]];
, {j, 2, n - 1}];
res
]
];

The method I used in an earlier response, and will refine below, was
based on Nearest[]. First some explanatory comments.

It takes the first two components of each tuple and augments with a
small "counter" value that allows us to determine which is the
corresponding element in the original list. We start with a large
"denominator", then prepend 1/denom to the first 2-tuple, 2/denom to the
second 2-tuple, etc. The presence of this counter value can in fact lead
to erroneous results by slightly offsetting distances based on the
2-tuple values alone. I now use a heuristic based on list size that
should make such a failure unlikely in the version below. But it is not
terribly clever and certainly can fail. A viable method would have to be
based on minimal separation, which could be underestimated by sorting
the flattened list of all values, and taking the smallest nonzero gap.

We'll assume that the complexity of creating a one-argument Nearest[]
function, given n values, is around O(n*log(n)). For modest dimension
this seems to be about right, if I recall correctly. Once we have such a
function we'll assume that the cost of looking up the k nearest values
to a given value is O(k*log(n)). Again, this seems to be born out with
experience as best I recall. Also I'll add that the implied
multiplicative constant is much larger for lookup than for building the
list.

At this points there are various ways to go about the lookup in order to
get the next closest value to a given one, from amongst all elements not
already chosen. A guaranteed way is, if j elements have been chosen,
find the j+1 nearest elements to the last one (so at least one will not
already have been selected).

If we build a Nearest function once, then we end up doing O(n) lookups,
each finding O(n) closest elements. The overall complexity would be
O(n^2*log(n)). (If we just call it anew every iteration, thats n alls of
n*log(n) complexity, so no better.)

A heuristic to reduce that, not guaranteed, would be to only look up the
7 or so closest, and try the full size needed only if all 7 have already
been selected.

Another way, this one guaranteed to bring down the complexity, is to
discard those elements already chosen from further consideration, at
intervals that are an appropriate function of n. If we do this every
sqrt(n) elements, and rebuild the Nearest function, then we never need
find more than sqrt(n) closest elements from amongst those still under
consideration. So overall lookup cost will be no worse than
O(n*sqrt(n)*log(n)). Meanwhile the cost of removing sqrt(n) elements
form a list of n elements is O(n), and we do this sqrt(n) times so that
comes to O(n^(3/2)). Finally the cost of building these Nearest
functions sqrt(n) times, with O(n) elements each time, is
O(n*log(n)*sqrt(n)). Overall cost: O(n^(3/2)*log(n)).

In my earlier code I used a different "chunk" size, one that seemed like
it might be sensible. It did work well for ranges I tried, but sqrt(n)
seems to be the value with the guarantee. The use of a new Nearest
function at intervals means I want to use an index computed modulo the
interval length (the chunk size, in the code terminology).

I also use the heuristic of trying only a few closest elements at first,
grabbing more only when needed. Some testing indicates this happens
relatively infrequently.

So here is the updated code. It should be properly Module-ized, but I'm
not spending more time to do that.

n = 3200;
data = RandomReal[{-100, 100}, {n, 4}];

chunksize = Ceiling[Sqrt[n]];
denom = 2^3*2^Ceiling[Log[2, n]];
moddata =
Map[Take[#, 3] &, MapThread[Prepend, {data, Range[n]/denom}]];
nf = Nearest[moddata];
next = moddata[];
taken = ConstantArray[False, n];
result = ConstantArray[{0., 0., 0., 0.}, n];
remove = ConstantArray[{0., 0., 0.}, chunksize];
result[] = data[];
remove[] = moddata[];
taken[] = True;

Timing[Do[
modj = Mod[j, chunksize, 1];
nextset = nf[next, Min[7, 1 + modj]];
posns = Round[denom*Map[First, nextset]];
k = 1;
While[k <= Length[nextset] && taken[[posns[[k]]]], k++;];
If[k > Length[nextset],
nextset = nf[next, 1 + modj];
posns = Round[denom*Map[First, nextset]];
k = 1;
While[k < Length[nextset] && taken[[posns[[k]]]], k++;];
];
taken[[posns[[k]]]] = True;
result[[j]] = data[[posns[[k]]]];
next = nextset[[k]];
remove[[modj]] = next;
If[modj == chunksize,
done = True;
posns = Apply[Alternatives, remove];
moddata = DeleteCases[moddata, posns];
nf = Nearest[moddata];
];
, {j, 2, n}]]

Out= {0.49, Null}

Comparison:

In:= Timing[result2 = findClosestUniqueC[data];]
Out= {1.89, Null}

In:= result === result2
Out= True

Daniel Lichtblau
Wolfram Research

• Prev by Date: How to set the format of every output to traditionalform in txt files.
• Next by Date: mlink SASImport
• Previous by thread: Re: sorting a nested list of 4-tuples
• Next by thread: Re: sorting a nested list of 4-tuples