MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Help to remove equivalent (redundant) solutions from

  • To: mathgroup at smc.vnet.net
  • Subject: [mg91500] Re: Help to remove equivalent (redundant) solutions from
  • From: mark mcclure <mcmcclur at unca.edu>
  • Date: Sun, 24 Aug 2008 07:05:57 -0400 (EDT)
  • References: <g8o858$l6a$1@smc.vnet.net>

On Aug 23, 1:45 am, Bob Hanlon <hanl... at cox.net> wrote:
> Use SameTest option in Union

I've had to think about this question at some length in my
work with fractal geometry, where I deal with large numbers
of points.  In this context, there are some issues with the
use of SameTest.  In particular, Union[S] first sorts its
elements in n*log(n) time and then compares in linear time.
Union[S, SameTest -> f] cannot assume any ordering so it
must compare pairwise, running in quadratic time.  If at all
possible, it is much faster to express each element in S in
some canonical form, then apply Union without the SameTest.
I've got some examples below but would be very curious to
know if anyone has improvements.

Here's a list of 400 pairs of points with each pair very
close to another.  Note that Union works very quickly, but
distinguishes between close pairs of points.

------
points = RandomReal[{0, 1}, {200, 2}];
points = Join[points, points + 10^(-15)];
Union[points] // Length // Timing

{0.000159, 400}
------


We can apply Bob's solution which, correctly, treats close
points the same but runs much longer.

------
Union[points, SameTest -> (Rationalize[#1, 10^(-10)] ===
    Rationalize[#2, 10^(-10)] &)] // Length // Timing

{4.36683, 200}
------


Note that  SameTest -> (Norm[#1 - #2] < 10^(-10) &) is
faster, but we can get the best of both worlds by
rationalizing first and then applying the Union.

-----
Union[Rationalize[points, 10^(-10)]] // Length // Timing

{0.04629, 200}
-----


Of course, there are issues with Rationalize, or Round, as
well.  First, you lose some precision in the output, which
might or might not be important for your application.  Also,
while it seems to work well with randomly generated data,
it's easy to devise situations where points close together
Rationalize to different results.  Thus, I'd be leary of
using Rationalize in this manner for serious work.  What
is needed is a reliable function that places things in a
canonical form.  Here's one such function I developed based
on clustering.

-----
Needs["HierarchicalClustering`"];
canonicalFunction[nonCanonicalValues_List] := Module[
   {toCanonical},
   Quiet[heirarchy = Agglomerate[N[nonCanonicalValues],
      DistanceFunction -> EuclideanDistance,
      Linkage -> "Average"]];
   segregate[Cluster[cl1_, cl2_, d_, _, _], tol_] :=

    MyClusters[cl1, cl2] /; d > tol;
   segregate[mine_MyClusters, tol_] := segregate[#, tol] & /@ mine;
   segregate[x_, _] := x;
   cf[cl_Cluster] := ClusterFlatten[cl];
   cf[x_] := {x};
   clusters =
    cf /@ List @@ Flatten[FixedPoint[segregate[#, 10^-12] &,
        MyClusters[heirarchy]]];
   canonicalValues = Chop[First /@ clusters];
   toCanonical[x_] := First[Nearest[canonicalValues][x]];
   toCanonical];
-----

We use it as follows.

-----
toCan = canonicalFunction[points];
Union[toCan /@ points] // Length // Timing

{0.138291, 200}
-----

Again, I'd love to see alternatives.

Mark McClure


  • Prev by Date: Integration takes more time:
  • Next by Date: RE: Integral of radial solution (hydrogen atom) is not evaluated
  • Previous by thread: Re: Help to remove equivalent (redundant) solutions from
  • Next by thread: Re: Re: Help to remove equivalent (redundant) solutions