Re: sorting a nested list of 4-tuples

*To*: mathgroup at smc.vnet.net*Subject*: [mg120382] Re: sorting a nested list of 4-tuples*From*: Daniel Lichtblau <danl at wolfram.com>*Date*: Thu, 21 Jul 2011 05:45:11 -0400 (EDT)*References*: <201107191055.GAA10229@smc.vnet.net>

On 07/19/2011 05:55 AM, Luis Valero wrote: > Dear Sirs, > > I want to sort a list of 4-tuples of real numbers, so that, the second 4-tuple has minimum distance to the first, the third, selected from the rest, ha minimum distance to the second, and so on. > > The distance is the euclidean distance, calculated with the first two elements of each 4-tuple. > > I have defined a function that work: > > orderedList[list_] := Module[{nearest}, > Flatten[ Nest[{ > Append[ #[[1]] , nearest = Flatten @@ Nearest[ Map[ Rule[ #[[{1, 2}]], #]&, #[[2]] ], Last[ #[[1]] ] [[{1, 2}]], 1] ], > Delete[ #[[2]], Position[ #[[2]], nearest ] ] }&, { {list[[1]] }, Delete[ list, 1 ] }, Length[ list ] - 2], 1] ]; > > > but I need to improve the temporal efficiency > > Thank you I wanted to improve on my last proposed method and also say a bit about the computational complexity. First I'll note that a straightforward double iteration can be used. If coupled with Compile it should be reasonably fast, albeit still O(n^2) for a list of n elements. Also it will scale nicely with dimension in cases where we are not restricting the distance measure to the first two components of the tuples. Here is code for this. findClosestUniqueC = Compile[{{ll, _Real, 2}}, Module[{n = Length[ll], res = ll, posn, max, min, diff, dist, last, tmp}, max = (4.*Max[ll[[All, 1 ;; 2]]])^2; Do[ last = res[[j - 1, 1 ;; 2]]; posn = 0; min = max; Do[ diff = res[[k, 1 ;; 2]] - last; dist = diff.diff; If[dist < min, min = dist; posn = k]; , {k, j, n}]; res[[{j, posn}]] = res[[{posn, j}]]; , {j, 2, n - 1}]; res ] ]; The method I used in an earlier response, and will refine below, was based on Nearest[]. First some explanatory comments. It takes the first two components of each tuple and augments with a small "counter" value that allows us to determine which is the corresponding element in the original list. We start with a large "denominator", then prepend 1/denom to the first 2-tuple, 2/denom to the second 2-tuple, etc. The presence of this counter value can in fact lead to erroneous results by slightly offsetting distances based on the 2-tuple values alone. I now use a heuristic based on list size that should make such a failure unlikely in the version below. But it is not terribly clever and certainly can fail. A viable method would have to be based on minimal separation, which could be underestimated by sorting the flattened list of all values, and taking the smallest nonzero gap. We'll assume that the complexity of creating a one-argument Nearest[] function, given n values, is around O(n*log(n)). For modest dimension this seems to be about right, if I recall correctly. Once we have such a function we'll assume that the cost of looking up the k nearest values to a given value is O(k*log(n)). Again, this seems to be born out with experience as best I recall. Also I'll add that the implied multiplicative constant is much larger for lookup than for building the list. At this points there are various ways to go about the lookup in order to get the next closest value to a given one, from amongst all elements not already chosen. A guaranteed way is, if j elements have been chosen, find the j+1 nearest elements to the last one (so at least one will not already have been selected). If we build a Nearest function once, then we end up doing O(n) lookups, each finding O(n) closest elements. The overall complexity would be O(n^2*log(n)). (If we just call it anew every iteration, thats n alls of n*log(n) complexity, so no better.) A heuristic to reduce that, not guaranteed, would be to only look up the 7 or so closest, and try the full size needed only if all 7 have already been selected. Another way, this one guaranteed to bring down the complexity, is to discard those elements already chosen from further consideration, at intervals that are an appropriate function of n. If we do this every sqrt(n) elements, and rebuild the Nearest function, then we never need find more than sqrt(n) closest elements from amongst those still under consideration. So overall lookup cost will be no worse than O(n*sqrt(n)*log(n)). Meanwhile the cost of removing sqrt(n) elements form a list of n elements is O(n), and we do this sqrt(n) times so that comes to O(n^(3/2)). Finally the cost of building these Nearest functions sqrt(n) times, with O(n) elements each time, is O(n*log(n)*sqrt(n)). Overall cost: O(n^(3/2)*log(n)). In my earlier code I used a different "chunk" size, one that seemed like it might be sensible. It did work well for ranges I tried, but sqrt(n) seems to be the value with the guarantee. The use of a new Nearest function at intervals means I want to use an index computed modulo the interval length (the chunk size, in the code terminology). I also use the heuristic of trying only a few closest elements at first, grabbing more only when needed. Some testing indicates this happens relatively infrequently. So here is the updated code. It should be properly Module-ized, but I'm not spending more time to do that. n = 3200; data = RandomReal[{-100, 100}, {n, 4}]; chunksize = Ceiling[Sqrt[n]]; denom = 2^3*2^Ceiling[Log[2, n]]; moddata = Map[Take[#, 3] &, MapThread[Prepend, {data, Range[n]/denom}]]; nf = Nearest[moddata]; next = moddata[[1]]; taken = ConstantArray[False, n]; result = ConstantArray[{0., 0., 0., 0.}, n]; remove = ConstantArray[{0., 0., 0.}, chunksize]; result[[1]] = data[[1]]; remove[[1]] = moddata[[1]]; taken[[1]] = True; Timing[Do[ modj = Mod[j, chunksize, 1]; nextset = nf[next, Min[7, 1 + modj]]; posns = Round[denom*Map[First, nextset]]; k = 1; While[k <= Length[nextset] && taken[[posns[[k]]]], k++;]; If[k > Length[nextset], nextset = nf[next, 1 + modj]; posns = Round[denom*Map[First, nextset]]; k = 1; While[k < Length[nextset] && taken[[posns[[k]]]], k++;]; ]; taken[[posns[[k]]]] = True; result[[j]] = data[[posns[[k]]]]; next = nextset[[k]]; remove[[modj]] = next; If[modj == chunksize, done = True; posns = Apply[Alternatives, remove]; moddata = DeleteCases[moddata, posns]; nf = Nearest[moddata]; ]; , {j, 2, n}]] Out[1294]= {0.49, Null} Comparison: In[1336]:= Timing[result2 = findClosestUniqueC[data];] Out[1336]= {1.89, Null} In[1337]:= result === result2 Out[1337]= True Daniel Lichtblau Wolfram Research

**References**:**sorting a nested list of 4-tuples***From:*Luis Valero <luis.valero@mac.com>