Re: Vector Union

*To*: mathgroup at smc.vnet.net*Subject*: [mg23248] Re: [mg23117] Vector Union*From*: Hartmut Wolf <hwolf at debis.com>*Date*: Sat, 29 Apr 2000 22:04:54 -0400 (EDT)*Organization*: debis Systemhaus*References*: <200004190630.CAA07517@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

My comments below: Russell Towle schrieb: > > The problem of finding a function to return the union of a list of > n-vectors which is at once *fast* and yet *unfailing* has beguiled me for > several years. I contrived a slow but unfailing 'vunion' function about > five years ago; made a not-so-slow and unfailing version a year ago; and > then found it to occasionally fail when Chop was applied to the vectors > without a second argument. > > I posed a programming challenge to the Group a few days ago: make a fast > and unfailing vector union function. David Park and Michael Trott were kind > enough to respond, for which I thank them very much. Around a year around > Martin Kraus also supplied a vector union function. > > Good solutions seem to hinge upon *sorting the list of n-vectors*. > Searching the internet, I find that sorting a list of vectors seems to be a > classic problem in computer science. This is where Chop comes into play: to > compare one vector to another, when the components are machine precision, > is easier when very small numbers with opposite signs are Chopped to zero. > > My function, slow but sure: > > vunion[v_] := > Union[Chop[v, .000001], > SameTest -> (Sign[#1] == Sign[#2] && > Chop[#1 - #2, .000001] == Table[0, {Length[v[[1]]]}] & )] > > David Park appealed to an "UnorderUnion" function devised by Carl Woll, > which works well on lists with integer and symbolic components. Park uses > Round to put the machine numbers into a form acceptable to UnorderUnion. > The implications of using Round worry me, but Park's method has not failed > yet, given inputs which made most other methods fail. His function is > around twenty times faster than mine. Thanks David!! > > (*modified Woll*) > OrderedUnion2[li_] := > Block[{i}, i[n_] := (i[n] = Null; n); > i /@ li] > > vunion2[v_] := > Block[{accuracylist, v2}, > accuracylist = OrderedUnion2[Round[10^5*(v2 = Sort[Chop[v, .000001]])]]; > Do[If[accuracylist[[i]] === Null, v2[[i]] = Null], {i, Length[v2]}]; > v2 /. Null -> Sequence[]] > Dear Russell, yesterday, at Easter Monday, which is a holiday around here, I found some time to study your program (at home with Vers. 3.0), only to detect today that there are great differences with Vers. 4.0. I understand your problem as to unify Points (again), which after originating from the same specification, went through different calculation histories and finally came out slightly different. So there is a first problem: which differences are spurious, and which are essential? Regarding differences small enough as negligible is only a heuristics, not of same value for different problems, and furthermore what is "small enough"? You froze that to Chop[#1-#2, 1. 10^-6] == zeroVector, I'm sceptical. To unify your points you have in principal two methods: one would be to find clusters of points, and then -- if small enough -- pic out a representative, e.g. a point near the center of gravity (or median, ...), or the center itself (not a menber of the point set). If the clusters are not small enough, you will have got the problem to subdivide them artificially. Another -- easier -- method, is to effectively divide the space into a raster of cells, and unify all points falling into the same cell. This method however also has serious drawbacks: the raster might cut through the cluster, and instead of finding one representative, you'll get more (2 - 8). I made a simulation (see below) and showed that your "reference" method (for the "programming contest") is not "translation invariant". Another point is: you connected unifying with sorting, that is not compelling, but sorting gives efficient solutions to certain problems for unification. The scheme is like this First /@ Split[Sort[p_List, OrderedQ[code[#1], code[#2]]&], (code[#1]==code[#2]&)] where code is a function from p-List into some ordered set, and all points giving the same code will be unified. It was only this morning that I learnt that Union does not (or not any longer?) work that way (from "[mg23234] Re: [mg23206] Re: [mg23174] A simple programming question.." by Carl Woll <mailto:carlw at u.washington.edu>). However Carl's now famous OrderedUnion shows that you can unify without sorting. Now something to observe (this is "4.0 for Microsoft Windows (April 21, 1999)"). Here are (reproducible) sets for test: In[1]:= d[] := Module[{delta}, While[delta = {Random[Real, {-1, 1}], Random[Real, {-1, 1}], Random[Real, {-1, 1}]}; delta.delta > 1]; delta] In[2]:= SeedRandom[200004241812] In[3]:= vstart = Fold[ Flatten[Function[{ctr}, {{ctr}, Table[ctr + #2 d[], {2}]} ] /@ #1, 2] &, {{0, 0, 0}}, {1, 10}]; In[5]:= vects1 = Fold[ Flatten[Function[{ctr}, {ctr + #2 * d[], ctr + #2 * d[], ctr + #2 * d[]} ] /@ #1, 1] &, vstart, {3. 10^-5, 1. 10^-5, 5. 10^-6, 3. 10^-6, 1. 10^-6}]; // Timing Out[5]= {3.815 Second, Null} In[6]:= vects = Fold[ Flatten[Function[{ctr}, {ctr + #2 * d[], ctr + #2 * d[], ctr + #2 * d[]} ] /@ #1, 1] &, vstart, {3. 10^-6, 1. 10^-6, 3. 10^-7, 1. 10^-7, 3. 10^-8} ]; // Timing Out[6]= {3.726 Second, Null} In[7]:= Length[vects] Out[7]= 2187 In[8]:= 3^Range[7] Out[8]= {3, 9, 27, 81, 243, 729, 2187} In[10]:= shiftvs = {1, 1, 1} 3 10^-5 + # & /@ vects; In[11]:= shiftvs1 = {1, 1, 1} 3 10^-5 + # & /@ vects1; (two sets of different density, and slightly shifted versions) Your function: In[18]:= vunion[v_] := Union[Chop[v, 0.000001], SameTest -> ((Sign[#1] == Sign[#2]) && (Chop[#1 - #2, 0.000001] == Table[0, {Length[v[[1]]]}]) &) ] In[19]:= Print["# ", #, ": ", (uvects["R", #] = vunion[Take[vects, #]]; // Timing)[[1]]] & /@ Floor[Sqrt[3]^Range[10, 14]] "# "243": "0.220 Second "# "420": "0.370 Second "# "729": "0.772 Second "# "1262": "1.512 Second "# "2187": "2.814 Second In[21]:= Map[{#, Length[uvects["R", #]]} &, Floor[Sqrt[3]^Range[10, 14]] ] Out[21]= {{243, 9}, {420, 14}, {729, 20}, {1262, 36}, {2187, 60}} In[20]:= Print["# ", #, ": ", (uvects1["R", #] = vunion[Take[vects1, #]]; // Timing)[[1]]] & /@ Floor[Sqrt[3]^Range[10, 14]] "# "243": "3.475 Second "# "420": "9.473 Second "# "729": "34.971 Second "# "1262": "85.863 Second "# "2187": "225.094 Second In[22]:= Map[{#, Length[uvects1["R", #]]} &, Floor[Sqrt[3]^Range[10, 14]] ] Out[22]= {{243, 123}, {420, 207}, {729, 365}, {1262, 621}, {2187, 1063}} We see the O(n^2) behaviour, Carl has reported; also a drastically increased runtime for the set (vects1) being less dense. In[23]:= uvects21["R"] = vunion[shiftvs1]; // Timing Out[23]= {220.607 Second, Null} In[24]:= Length[uvects21["R"]] Out[24]= 1059 In[25]:= uvects2["R"] = vunion[shiftvs]; // Timing Out[25]= {2.734 Second, Null} In[26]:= Length[uvects2["R"]] Out[26]= 58 So you see, you get different answers for the shifted and unshifted sets, something you have to justify from your application (unknown to us). To visiualize the data (and the results) you may plot: In[100]:= Show[Graphics3D[{PointSize[0.02], Point /@ vects1}]] You see 9 points, which turn out to be clusters when closed up. Take the cluster at coordinates {0,0,0}: In[124]:= Show[Graphics3D[{PointSize[0.01], Point /@ vects1}], PlotRange -> 3{{-10^-5, 10^-5}, {-10^-5, 10^-5}, {-10^-5, 10^-5}}, Axes -> True] Three clouds are visible, let's come in close to the one left above: In[137]:= g1 = Show[Graphics3D[{PointSize[0.01], Point /@ vects1}], PlotRange -> {{-3 10^-5, -1.25 10^-5}, {-1.8 10^-5, 0. 10^-5}, {1. 10^-5, 2.6 10^-5}}, Axes -> True] Now compare that with In[140]:= gg1 = Show[Graphics3D[{PointSize[0.01], Hue[0], Point /@ uvects1["R", 2187]}], PlotRange -> {{-3 10^-5, -1.25 10^-5}, {-1.8 10^-5, 0. 10^-5}, {1. 10^-5, 2.6 10^-5}}] and superpose: In[141]:= Show[gg1, g1] (You might wish to enlarge that and view from different sides.) If you simplify your function to In[145]:= vunion3[v_] := Union[v, SameTest -> ((Chop[#1 - #2, 0.000001] == zeroVector) &)] then you'll be translation invariant (same unification for shifted and unshifted points), and you'll get In[147]:= Length[uvects["R3"]] Out[147]= 58 In[151]:= Length[uvects1["R3"]] Out[151]= 1058 points. I'll now compare to David Park's proposal: In[32]:= boost = 10^6; In[33]:= OrderedUnionD[list_List] := Block[{id}, id[elm_] := (id[elm] = Null; elm); id /@ list] In[34]:= vunionD[v_] := Block[{accuracylist, v2}, accuracylist = OrderedUnionD[Round[boost* (v2 = Sort[Chop[v, .000001]])]]; Do[ If[accuracylist[[i]] == Null, v2[[i]] = Null], {i, Length[v2]}]; v2 /. Null -> Sequence[] ] In[35]:= uvects["D"] = vunionD[vects]; // Timing Out[35]= {1.042 Second, Null} In[36]:= Length[uvects["D"]] Out[36]= 153 In[37]:= uvects2["D"] = vunionD[shiftvs]; // Timing Out[37]= {0.771 Second, Null} In[38]:= Length[uvects2["D"]] Out[38]= 151 In[39]:= uvects1["D"] = vunionD[vects1]; // Timing Out[39]= {1.622 Second, Null} In[40]:= Length[uvects1["D"]] Out[40]= 1881 In[41]:= uvects21["D"] = vunionD[shiftvs1]; // Timing Out[41]= {1.663 Second, Null} In[42]:= Length[uvects21["D"]] Out[42]= 1884 Again not translation invariant, but splendid runtimes. (The variant with DeleteCases[v2, Null] has quite the same properties.) David's proposal contains a Sort, you may equally well do without: In[57]:= OrderedUnionD3[list_List, code_] := Block[{id}, id[elm_] := (id[elm] = 0; 1); id[code[#]] & /@ list] In[58]:= vunionD3[v_] := Block[{taglist}, taglist = OrderedUnionD3[v, Round[boost*#] & ]; First /@ Cases[Transpose[{v, taglist}], {_, 1}] ] The execution time is quite the same (for this size of problem, though it should be O(n); sorting 2000 elements is of no issue). Here is another variant of Davids's proposal, which also takes no sort but "rasters" the points: In[70]:= OrderedUnionD4[list_List, code_] := Block[{id}, id[elm_] := (id[elm] = Sequence[]; elm); id[code[#]] & /@ list] In[71]:= uvects["D4"] = OrderedUnionD4[vects, Round[boost*#]/boost & ]; // Timing Out[71]= {1.542 Second, Null} In[72]:= Length[uvects["D4"]] Out[72]= 151 In[74]:= uvects2["D4"] = OrderedUnionD4[shiftvs, Round[boost*#]/boost & ]; // Timing Out[74]= {1.753 Second, Null} In[75]:= Length[uvects2["D4"]] Out[75]= 151 In[76]:= uvects1["D4"] = OrderedUnionD4[vects1, Round[boost*#]/boost & ]; // Timing Out[76]= {2.323 Second, Null} In[77]:= Length[uvects1["D4"]] Out[77]= 1884 In[79]:= uvects21["D4"] = OrderedUnionD4[shiftvs1, Round[boost*#]/boost & ]; // Timing Out[79]= {2.514 Second, Null} In[80]:= Length[uvects21["D4"]] Out[80]= 1884 It is slower (for this size of problem). The "selected" points are rastered, i.e. not members of the original set; this can also be seen when plotting. Although it here appears to be so, it cannot be translation invariant (due to round of the absolute values). Perhaps you also would like to study the proposals In[314]:= vunionH[p_List, code_] := First /@ Split[ Sort[p, OrderedQ[code[#1], code[#2]] &], (code[#1] == code[#2] &)] In[320]:= uvects["H"] = vunionH[vects, Round[boost*#] & ]; // Timing Out[320]= {2.623 Second, Null} In[321]:= Length[uvects["H"]] Out[321]= 275 and In[305]:= vunionH2[p_List, code_] := Part[#, 1, 2] & /@ Split[Sort[Transpose[{code /@ p, p}]], (First[#1] === First[#2] &)] In[306]:= uvects["H2"] = vunionH2[vects, Round[boost*#] & ]; // Timing Out[306]= {0.461 Second, Null} In[307]:= Length[uvects["H2"]] Out[307]= 151 Kind regards, Hartmut

**References**:**Vector Union***From:*Russell Towle <rustybel@foothill.net>