Re: Re: Re: Help to remove equivalent
- To: mathgroup at smc.vnet.net
- Subject: [mg91536] Re: [mg91522] Re: [mg91500] Re: Help to remove equivalent
- From: Bob Hanlon <hanlonr at cox.net>
- Date: Mon, 25 Aug 2008 06:57:26 -0400 (EDT)
- Reply-to: hanlonr at cox.net
It is also noticeably faster to use the approximate number 10.^-10 rather than the exact number 10^-10 when testing the Norm. Bob Hanlon ---- Bob Hanlon <hanlonr at cox.net> wrote: ============= points = RandomReal[{0, 1}, {200, 2}]; points = Join[points, points + 10^(-15)]; Union[points] // Length // Timing {0.000113,400} If a representative is good enough, take the First (or Last) element of a grouping (First /@ Split[Sort[points], Norm[#1 - #2] < 10^-10 &]) // Length // Timing {0.004647,200} Or average the values of a grouping (Mean /@ Split[Sort[points], Norm[#1 - #2] < 10^-10 &]) // Length // Timing {0.00506,200} Bob Hanlon ---- mark mcclure <mcmcclur at unca.edu> wrote: ============= On Aug 23, 1:45 am, Bob Hanlon <hanl... at cox.net> wrote: > Use SameTest option in Union I've had to think about this question at some length in my work with fractal geometry, where I deal with large numbers of points. In this context, there are some issues with the use of SameTest. In particular, Union[S] first sorts its elements in n*log(n) time and then compares in linear time. Union[S, SameTest -> f] cannot assume any ordering so it must compare pairwise, running in quadratic time. If at all possible, it is much faster to express each element in S in some canonical form, then apply Union without the SameTest. I've got some examples below but would be very curious to know if anyone has improvements. Here's a list of 400 pairs of points with each pair very close to another. Note that Union works very quickly, but distinguishes between close pairs of points. ------ points = RandomReal[{0, 1}, {200, 2}]; points = Join[points, points + 10^(-15)]; Union[points] // Length // Timing {0.000159, 400} ------ We can apply Bob's solution which, correctly, treats close points the same but runs much longer. ------ Union[points, SameTest -> (Rationalize[#1, 10^(-10)] === Rationalize[#2, 10^(-10)] &)] // Length // Timing {4.36683, 200} ------ Note that SameTest -> (Norm[#1 - #2] < 10^(-10) &) is faster, but we can get the best of both worlds by rationalizing first and then applying the Union. ----- Union[Rationalize[points, 10^(-10)]] // Length // Timing {0.04629, 200} ----- Of course, there are issues with Rationalize, or Round, as well. First, you lose some precision in the output, which might or might not be important for your application. Also, while it seems to work well with randomly generated data, it's easy to devise situations where points close together Rationalize to different results. Thus, I'd be leary of using Rationalize in this manner for serious work. What is needed is a reliable function that places things in a canonical form. Here's one such function I developed based on clustering. ----- Needs["HierarchicalClustering`"]; canonicalFunction[nonCanonicalValues_List] := Module[ {toCanonical}, Quiet[heirarchy = Agglomerate[N[nonCanonicalValues], DistanceFunction -> EuclideanDistance, Linkage -> "Average"]]; segregate[Cluster[cl1_, cl2_, d_, _, _], tol_] := MyClusters[cl1, cl2] /; d > tol; segregate[mine_MyClusters, tol_] := segregate[#, tol] & /@ mine; segregate[x_, _] := x; cf[cl_Cluster] := ClusterFlatten[cl]; cf[x_] := {x}; clusters = cf /@ List @@ Flatten[FixedPoint[segregate[#, 10^-12] &, MyClusters[heirarchy]]]; canonicalValues = Chop[First /@ clusters]; toCanonical[x_] := First[Nearest[canonicalValues][x]]; toCanonical]; ----- We use it as follows. ----- toCan = canonicalFunction[points]; Union[toCan /@ points] // Length // Timing {0.138291, 200} ----- Again, I'd love to see alternatives. Mark McClure