Services & Resources / Wolfram Forums
-----
 /
MathGroup Archive
2000
*January
*February
*March
*April
*May
*June
*July
*August
*September
*October
*November
*December
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 2000

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Vector Union

  • To: mathgroup at smc.vnet.net
  • Subject: [mg23248] Re: [mg23117] Vector Union
  • From: Hartmut Wolf <hwolf at debis.com>
  • Date: Sat, 29 Apr 2000 22:04:54 -0400 (EDT)
  • Organization: debis Systemhaus
  • References: <200004190630.CAA07517@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

My comments below:

Russell Towle schrieb:
> 
> The problem of finding a function to return the union of a list of
> n-vectors which is at once *fast* and yet *unfailing* has beguiled me for
> several years. I contrived a slow but unfailing 'vunion' function about
> five years ago; made a not-so-slow and unfailing version a year ago; and
> then found it to occasionally fail when Chop was applied to the vectors
> without a second argument.
> 
> I posed a programming challenge to the Group a few days ago: make a fast
> and unfailing vector union function. David Park and Michael Trott were kind
> enough to respond, for which I thank them very much. Around a year around
> Martin Kraus also supplied a vector union function.
> 
> Good solutions seem to hinge upon *sorting the list of n-vectors*.
> Searching the internet, I find that sorting a list of vectors seems to be a
> classic problem in computer science. This is where Chop comes into play: to
> compare one vector to another, when the components are machine precision,
> is easier when very small numbers with opposite signs are Chopped to zero.
> 
> My function, slow but sure:
> 
> vunion[v_] :=
> Union[Chop[v, .000001],
> SameTest -> (Sign[#1] == Sign[#2] &&
> Chop[#1 - #2, .000001] == Table[0, {Length[v[[1]]]}] & )]
> 
> David Park appealed to an "UnorderUnion" function devised by Carl Woll,
> which works well on lists with integer and symbolic components. Park uses
> Round to put the machine numbers into a form acceptable to UnorderUnion.
> The implications of using Round worry me, but Park's method has not failed
> yet, given inputs which made most other methods fail. His function is
> around twenty times faster than mine. Thanks David!!
> 
> (*modified Woll*)
> OrderedUnion2[li_] :=
> Block[{i}, i[n_] := (i[n] = Null; n);
> i /@ li]
> 
> vunion2[v_] :=
> Block[{accuracylist, v2},
> accuracylist = OrderedUnion2[Round[10^5*(v2 = Sort[Chop[v, .000001]])]];
> Do[If[accuracylist[[i]] === Null, v2[[i]] = Null], {i, Length[v2]}];
> v2 /. Null -> Sequence[]]
> 


Dear Russell,

yesterday, at Easter Monday, which is a holiday around here, I found
some time to study your program (at home with Vers. 3.0), only to detect
today that there are great differences with Vers. 4.0.

I understand your problem as to unify Points (again), which after
originating from the same specification, went through different
calculation histories and finally came out slightly different.

So there is a first problem: which differences are spurious, and which
are essential? Regarding differences small enough as negligible is only
a heuristics, not of same value for different problems, and furthermore
what is "small enough"? You froze that to Chop[#1-#2, 1. 10^-6] ==
zeroVector,  I'm sceptical.

To unify your points you have in principal two methods: one would be to
find clusters of points, and then -- if small enough -- pic out a
representative, e.g. a point near the center of gravity (or median,
...), or the center itself (not a menber of the point set). If the
clusters are not small enough, you will have got the problem to
subdivide them artificially.

Another -- easier -- method, is to effectively divide the space into a
raster of cells, and unify all points falling into the same cell.  This
method however also has serious drawbacks: the raster might cut through
the cluster, and instead of finding one representative, you'll get more
(2 - 8). 

I made a simulation (see below) and showed that your "reference" method
(for the "programming contest") is not "translation invariant".

Another point is: you connected unifying with sorting, that is not
compelling, but sorting gives efficient solutions to certain problems
for unification. The scheme is like this

	First /@ Split[Sort[p_List, OrderedQ[code[#1], code[#2]]&], 
			(code[#1]==code[#2]&)]

where code is a function from p-List into some ordered set, and all
points giving the same code will be unified. It was only this morning
that I learnt that Union does not (or not any longer?) work that way
(from "[mg23234] Re: [mg23206] Re: [mg23174] A simple programming
question.." by Carl Woll <mailto:carlw at u.washington.edu>). 

However Carl's now famous OrderedUnion shows that you can unify without
sorting.

Now something to observe (this is "4.0 for Microsoft Windows (April 21,
1999)").
Here are (reproducible) sets for test:
 
In[1]:= d[] := Module[{delta}, 
    While[delta = {Random[Real, {-1, 1}], Random[Real, {-1, 1}], 
          Random[Real, {-1, 1}]}; delta.delta > 1]; delta]

In[2]:= SeedRandom[200004241812]
In[3]:= vstart = Fold[
      Flatten[Function[{ctr}, {{ctr}, Table[ctr + #2 d[], {2}]} ] /@ #1, 
          2] &, {{0, 0, 0}}, {1, 10}];

In[5]:= vects1 = Fold[
        Flatten[Function[{ctr}, {ctr + #2 * d[], ctr + #2 * d[], 
                  ctr + #2 * d[]} ] /@ #1, 1] &, vstart,
        {3. 10^-5, 1. 10^-5, 5. 10^-6, 3. 10^-6, 1. 10^-6}]; // Timing
Out[5]=
{3.815 Second, Null}

In[6]:=
vects = Fold[
        Flatten[Function[{ctr}, {ctr + #2 * d[], ctr + #2 * d[], 
                  ctr + #2 * d[]} ] /@ #1, 1] &, 
        vstart, {3. 10^-6, 1. 10^-6, 3. 10^-7, 1. 10^-7, 3. 10^-8}
        ]; // Timing
Out[6]=
{3.726 Second, Null}

In[7]:= Length[vects]
Out[7]= 2187
In[8]:= 3^Range[7]
Out[8]= {3, 9, 27, 81, 243, 729, 2187}

In[10]:= shiftvs = {1, 1, 1} 3 10^-5 + # & /@ vects;
In[11]:= shiftvs1 = {1, 1, 1} 3 10^-5 + # & /@ vects1;

(two sets of different density, and slightly shifted versions)

Your function:

In[18]:= vunion[v_] := Union[Chop[v, 0.000001],
		SameTest -> ((Sign[#1] == Sign[#2]) && (Chop[#1 - #2, 0.000001] == 
                Table[0, {Length[v[[1]]]}]) &)
	]
In[19]:= Print["# ", #, 
      ":  ", (uvects["R", #] = vunion[Take[vects, #]]; // Timing)[[1]]]
& /@ 
  Floor[Sqrt[3]^Range[10, 14]]
"# "243":  "0.220 Second
"# "420":  "0.370 Second
"# "729":  "0.772 Second
"# "1262":  "1.512 Second
"# "2187":  "2.814 Second

In[21]:= Map[{#, Length[uvects["R", #]]} &, Floor[Sqrt[3]^Range[10, 14]]
]
Out[21]= {{243, 9}, {420, 14}, {729, 20}, {1262, 36}, {2187, 60}}

In[20]:= Print["# ", #, 
      ":  ", (uvects1["R", #] = vunion[Take[vects1, #]]; //
Timing)[[1]]] & /@ 
  Floor[Sqrt[3]^Range[10, 14]]
"# "243":  "3.475 Second
"# "420":  "9.473 Second
"# "729":  "34.971 Second
"# "1262":  "85.863 Second
"# "2187":  "225.094 Second

In[22]:= Map[{#, Length[uvects1["R", #]]} &, Floor[Sqrt[3]^Range[10,
14]] ]
Out[22]=
{{243, 123}, {420, 207}, {729, 365}, {1262, 621}, {2187, 1063}}

We see the O(n^2) behaviour, Carl has reported; also a drastically
increased runtime for the set (vects1) being less dense.

In[23]:= uvects21["R"] = vunion[shiftvs1]; // Timing
Out[23]= {220.607 Second, Null}
In[24]:= Length[uvects21["R"]]
Out[24]= 1059

In[25]:= uvects2["R"] = vunion[shiftvs]; // Timing
Out[25]= {2.734 Second, Null}
In[26]:= Length[uvects2["R"]]
Out[26]= 58

So you see, you get different answers for the shifted and unshifted
sets, something you have to justify from your application (unknown to
us).

To visiualize the data (and the results) you may plot:

In[100]:= Show[Graphics3D[{PointSize[0.02], Point /@ vects1}]]

You see 9 points, which turn out to be clusters when closed up. Take the
cluster at coordinates {0,0,0}:

In[124]:= Show[Graphics3D[{PointSize[0.01], Point /@ vects1}], 
    PlotRange -> 3{{-10^-5, 10^-5}, {-10^-5, 10^-5}, {-10^-5, 10^-5}}, 
    Axes -> True]

Three clouds are visible, let's come in close to the one left above:

In[137]:= g1 = Show[Graphics3D[{PointSize[0.01], Point /@ vects1}], 
    PlotRange -> {{-3 10^-5, -1.25 10^-5}, {-1.8 10^-5, 0. 10^-5}, {1.
10^-5, 
          2.6 10^-5}}, Axes -> True]

Now compare that with

In[140]:=
gg1 = Show[Graphics3D[{PointSize[0.01], Hue[0], Point /@ uvects1["R",
2187]}],
     PlotRange -> {{-3 10^-5, -1.25 10^-5}, {-1.8 10^-5, 0. 10^-5}, {1.
10^-5,
           2.6 10^-5}}]

and superpose:

In[141]:= Show[gg1, g1]

(You might wish to enlarge that and view from different sides.)

If you simplify your function to

In[145]:= vunion3[v_] := Union[v,
		SameTest -> ((Chop[#1 - #2, 0.000001] == zeroVector) &)]

then you'll be translation invariant (same unification for shifted and
unshifted points), and you'll get

In[147]:= Length[uvects["R3"]]
Out[147]= 58
In[151]:= Length[uvects1["R3"]]
Out[151]= 1058

points. I'll now compare to David Park's proposal:

In[32]:= boost =  10^6;
In[33]:= OrderedUnionD[list_List] :=
	Block[{id}, id[elm_] := (id[elm] = Null; elm);
		id /@ list]
In[34]:= 
vunionD[v_] := Block[{accuracylist, v2},
		accuracylist = OrderedUnionD[Round[boost*
                   (v2 = Sort[Chop[v, .000001]])]];
		Do[ If[accuracylist[[i]] == Null, v2[[i]] = Null], 
                   {i, Length[v2]}];
		v2 /. Null -> Sequence[]
	]
In[35]:= uvects["D"] = vunionD[vects]; // Timing
Out[35]= {1.042 Second, Null}
In[36]:= Length[uvects["D"]]
Out[36]= 153
In[37]:= uvects2["D"] = vunionD[shiftvs]; // Timing
Out[37]= {0.771 Second, Null}
In[38]:= Length[uvects2["D"]]
Out[38]= 151
In[39]:= uvects1["D"] = vunionD[vects1]; // Timing
Out[39]= {1.622 Second, Null}
In[40]:= Length[uvects1["D"]]
Out[40]= 1881
In[41]:= uvects21["D"] = vunionD[shiftvs1]; // Timing
Out[41]= {1.663 Second, Null}
In[42]:= Length[uvects21["D"]]
Out[42]= 1884
 
Again not translation invariant, but splendid runtimes. (The variant
with DeleteCases[v2, Null] has quite the same properties.)

David's proposal contains a Sort, you may equally well do without:

In[57]:= OrderedUnionD3[list_List, code_] :=
	Block[{id}, id[elm_] := (id[elm] = 0; 1);
		id[code[#]] & /@ list]
In[58]:=
vunionD3[v_] := Block[{taglist},
		taglist = OrderedUnionD3[v, Round[boost*#] & ];
		First /@ Cases[Transpose[{v, taglist}], {_, 1}]
	]

The execution time is quite the same (for this size of problem, though
it should be O(n); sorting 2000 elements is of no issue).


Here is another variant of Davids's proposal, which also takes no sort
but "rasters" the points:

In[70]:= OrderedUnionD4[list_List, code_] :=
	Block[{id}, id[elm_] := (id[elm] = Sequence[]; elm);
		id[code[#]] & /@ list]
In[71]:= uvects["D4"] = OrderedUnionD4[vects,  Round[boost*#]/boost & ];
// Timing
Out[71]= {1.542 Second, Null}
In[72]:= Length[uvects["D4"]]
Out[72]= 151

In[74]:= uvects2["D4"] = OrderedUnionD4[shiftvs, Round[boost*#]/boost &
]; // Timing
Out[74]= {1.753 Second, Null}
In[75]:= Length[uvects2["D4"]]
Out[75]= 151
In[76]:= uvects1["D4"] = OrderedUnionD4[vects1,  Round[boost*#]/boost &
]; // Timing
Out[76]= {2.323 Second, Null}
In[77]:= Length[uvects1["D4"]]
Out[77]= 1884

In[79]:= uvects21["D4"] = OrderedUnionD4[shiftvs1, Round[boost*#]/boost
& ]; // Timing
Out[79]= {2.514 Second, Null}
In[80]:= Length[uvects21["D4"]]
Out[80]= 1884
 
It is slower (for this size of problem). The "selected" points are
rastered, i.e. not members of the original set; this can also be seen
when plotting. Although it here appears to be so, it cannot be
translation invariant (due to round of the absolute values).


Perhaps you also would like to study the proposals

In[314]:= vunionH[p_List, code_] := 
  First /@ Split[
      Sort[p, OrderedQ[code[#1], code[#2]] &], (code[#1] == code[#2] &)]

In[320]:= uvects["H"] = vunionH[vects,  Round[boost*#] & ]; // Timing
Out[320]= {2.623 Second, Null}
In[321]:= Length[uvects["H"]]
Out[321]= 275

and

In[305]:= vunionH2[p_List, code_] := 
  Part[#, 1, 2] & /@ 
    Split[Sort[Transpose[{code /@ p, p}]], (First[#1] === First[#2] &)]

In[306]:= uvects["H2"] = vunionH2[vects,  Round[boost*#] & ]; // Timing
Out[306]= {0.461 Second, Null}
In[307]:= Length[uvects["H2"]]
Out[307]= 151


Kind regards, Hartmut


  • References:
  • Prev by Date: Re: Re: Demonstrate that 1==-1
  • Next by Date: Interpolation bugs (feature?) - work arounds & efficiency for large n?
  • Previous by thread: Vector Union
  • Next by thread: Re: Vector Union