MathGroup Archive: July 2011 [00432]

[Date Index] [Thread Index] [Author Index]
Re: sorting a nested list of 4-tuples
To: mathgroup at smc.vnet.net
Subject: [mg120382] Re: sorting a nested list of 4-tuples
From: Daniel Lichtblau <danl at wolfram.com>
Date: Thu, 21 Jul 2011 05:45:11 -0400 (EDT)
References: <201107191055.GAA10229@smc.vnet.net>
On 07/19/2011 05:55 AM, Luis Valero wrote:
> Dear Sirs,
>
> I want to sort a list of 4-tuples of real numbers, so that, the second 4-tuple has minimum distance to the first, the third, selected from the rest, ha minimum distance to the second, and so on.
>
> The distance is the euclidean distance, calculated with the first two elements of each 4-tuple.
>
> I have defined a function that work:
>
> orderedList[list_] := Module[{nearest},
> 	Flatten[ Nest[{
>     	    Append[ #[[1]] , nearest = Flatten @@ Nearest[ Map[ Rule[ #[[{1, 2}]], #]&, #[[2]] ], Last[ #[[1]] ] [[{1, 2}]], 1] ],
>      	    Delete[ #[[2]], Position[ #[[2]], nearest ] ]  }&, { {list[[1]] }, Delete[ list, 1 ] }, Length[ list ] - 2], 1] ];
>
>
> but I need to improve the temporal efficiency
>
> Thank you

I wanted to improve on my last proposed method and also say a bit about 
the computational complexity.

First I'll note that a straightforward double iteration can be used. If 
coupled with Compile it should be reasonably fast, albeit still O(n^2) 
for a list of n elements. Also it will scale nicely with dimension in 
cases where we are not restricting the distance measure to the first two 
components of the tuples.

Here is code for this.

findClosestUniqueC = Compile[{{ll, _Real, 2}},
    Module[{n = Length[ll], res = ll, posn, max, min, diff, dist, last,
       tmp},
     max = (4.*Max[ll[[All, 1 ;; 2]]])^2;
     Do[
      last = res[[j - 1, 1 ;; 2]];
      posn = 0;
      min = max;
      Do[
       diff = res[[k, 1 ;; 2]] - last;
       dist = diff.diff;
       If[dist < min, min = dist; posn = k];
       , {k, j, n}];
      res[[{j, posn}]] = res[[{posn, j}]];
      , {j, 2, n - 1}];
     res
     ]
    ];

The method I used in an earlier response, and will refine below, was 
based on Nearest[]. First some explanatory comments.

It takes the first two components of each tuple and augments with a 
small "counter" value that allows us to determine which is the 
corresponding element in the original list. We start with a large 
"denominator", then prepend 1/denom to the first 2-tuple, 2/denom to the 
second 2-tuple, etc. The presence of this counter value can in fact lead 
to erroneous results by slightly offsetting distances based on the 
2-tuple values alone. I now use a heuristic based on list size that 
should make such a failure unlikely in the version below. But it is not 
terribly clever and certainly can fail. A viable method would have to be 
based on minimal separation, which could be underestimated by sorting 
the flattened list of all values, and taking the smallest nonzero gap.

We'll assume that the complexity of creating a one-argument Nearest[] 
function, given n values, is around O(n*log(n)). For modest dimension 
this seems to be about right, if I recall correctly. Once we have such a 
function we'll assume that the cost of looking up the k nearest values 
to a given value is O(k*log(n)). Again, this seems to be born out with 
experience as best I recall. Also I'll add that the implied 
multiplicative constant is much larger for lookup than for building the 
list.

At this points there are various ways to go about the lookup in order to 
get the next closest value to a given one, from amongst all elements not 
already chosen. A guaranteed way is, if j elements have been chosen, 
find the j+1 nearest elements to the last one (so at least one will not 
already have been selected).

If we build a Nearest function once, then we end up doing O(n) lookups, 
each finding O(n) closest elements. The overall complexity would be 
O(n^2*log(n)). (If we just call it anew every iteration, thats n alls of 
n*log(n) complexity, so no better.)

A heuristic to reduce that, not guaranteed, would be to only look up the 
7 or so closest, and try the full size needed only if all 7 have already 
been selected.

Another way, this one guaranteed to bring down the complexity, is to 
discard those elements already chosen from further consideration, at 
intervals that are an appropriate function of n. If we do this every 
sqrt(n) elements, and rebuild the Nearest function, then we never need 
find more than sqrt(n) closest elements from amongst those still under 
consideration. So overall lookup cost will be no worse than 
O(n*sqrt(n)*log(n)). Meanwhile the cost of removing sqrt(n) elements 
form a list of n elements is O(n), and we do this sqrt(n) times so that 
comes to O(n^(3/2)). Finally the cost of building these Nearest 
functions sqrt(n) times, with O(n) elements each time, is 
O(n*log(n)*sqrt(n)). Overall cost: O(n^(3/2)*log(n)).

In my earlier code I used a different "chunk" size, one that seemed like 
it might be sensible. It did work well for ranges I tried, but sqrt(n) 
seems to be the value with the guarantee. The use of a new Nearest 
function at intervals means I want to use an index computed modulo the 
interval length (the chunk size, in the code terminology).

I also use the heuristic of trying only a few closest elements at first, 
grabbing more only when needed. Some testing indicates this happens 
relatively infrequently.

So here is the updated code. It should be properly Module-ized, but I'm 
not spending more time to do that.

n = 3200;
data = RandomReal[{-100, 100}, {n, 4}];

chunksize = Ceiling[Sqrt[n]];
denom = 2^3*2^Ceiling[Log[2, n]];
moddata =
   Map[Take[#, 3] &, MapThread[Prepend, {data, Range[n]/denom}]];
nf = Nearest[moddata];
next = moddata[[1]];
taken = ConstantArray[False, n];
result = ConstantArray[{0., 0., 0., 0.}, n];
remove = ConstantArray[{0., 0., 0.}, chunksize];
result[[1]] = data[[1]];
remove[[1]] = moddata[[1]];
taken[[1]] = True;

Timing[Do[
   modj = Mod[j, chunksize, 1];
   nextset = nf[next, Min[7, 1 + modj]];
   posns = Round[denom*Map[First, nextset]];
   k = 1;
   While[k <= Length[nextset] && taken[[posns[[k]]]], k++;];
   If[k > Length[nextset],
    nextset = nf[next, 1 + modj];
    posns = Round[denom*Map[First, nextset]];
    k = 1;
    While[k < Length[nextset] && taken[[posns[[k]]]], k++;];
    ];
   taken[[posns[[k]]]] = True;
   result[[j]] = data[[posns[[k]]]];
   next = nextset[[k]];
   remove[[modj]] = next;
   If[modj == chunksize,
    done = True;
    posns = Apply[Alternatives, remove];
    moddata = DeleteCases[moddata, posns];
    nf = Nearest[moddata];
    ];
   , {j, 2, n}]]

Out[1294]= {0.49, Null}

Comparison:

In[1336]:= Timing[result2 = findClosestUniqueC[data];]
Out[1336]= {1.89, Null}

In[1337]:= result === result2
Out[1337]= True

Daniel Lichtblau
Wolfram Research
References:
- sorting a nested list of 4-tuples
  - From: Luis Valero <luis.valero@mac.com>
Prev by Date: How to set the format of every output to traditionalform in txt files.
Next by Date: mlink SASImport
Previous by thread: Re: sorting a nested list of 4-tuples
Next by thread: Re: sorting a nested list of 4-tuples