Re: RE: Re: Re: distance function
- To: mathgroup at smc.vnet.net
- Subject: [mg70084] Re: [mg70014] RE: [mg68805] Re: [mg68772] Re: distance function
- From: Carl Woll <carlw at wolfram.com>
- Date: Mon, 2 Oct 2006 00:34:25 -0400 (EDT)
- References: <200609300912.FAA13156@smc.vnet.net>
Coleman, Mark wrote: > >Out of curiosity, I tested these two approaches on a number of data sets >for which I make frequent use. For some reason Jens code is running >slower! I've been testing it out some lists of reals of size >n=500,1000,2500, and 5000. Is it possible that the time for the >conditional compares is exceeding the computational time of redundant >calculations? Could someone try this out? > >(note: I'm working on some code for identifying outliers in large data >sets. The efficient calculation of L-1 and L-2 distance matrices are >important.) > >Thanks > >Mark > > > The package Statistics`ClusterAnalysis` has a very quick DistanceMatrix function: data=Table[Random[],{1000},{3}]; <<Statistics`ClusterAnalysis` Sqrt[DistanceMatrix[data]];//Timing {0.187 Second,Null} The default distance function for DistanceMatrix is SquaredEuclideanDistance, hence the use of Sqrt. It is possible to use the option DistanceFunction->EuclideanDistance to obviate the need for Sqrt: DistanceMatrix[data, DistanceFunction->EuclideanDistance]; However, this option seems to be about 25% slower. This is probably because it is faster to take the square root of a matrix and rely on vector operations rather than taking the square root of each matrix element. Carl Woll Wolfram Research >-----Original Message----- >From: Murray Eisenberg [mailto:murray at math.umass.edu] To: mathgroup at smc.vnet.net >Subject: [mg70084] [mg70014] [mg68805] Re: [mg68772] Re: distance function > >Yes, I KNOW that I'm computing the distances twice in my solution: >that's why I said it's an "extravagant" solution! > >Jens-Peer Kuska wrote: > > >>Hi Murray, >> >>at least you should compute the distances not twice because the matrix >> >> > > > >>is symmetric with zero diagonal ... >> >>d[{p_,p_}]:=0.0 >>d[{q_,p_}]/; OrderedQ[{q,p}]:=d[{q,p}]= Norm[p - q] >>d[{q_,p_}]:=d[{p,q}] >> >>Regards >> Jens >> >> >>Murray Eisenberg wrote: >> >> >>>If you don't mind an "extravagant" solution -- one that is >>>conceptually simple and short but is probably inefficient due to >>>redundant calculations -- then this works, I believe: >>> >>> d[{p_, q_}] := Norm[p - q] >>> allDistances[pts_] := Union[Flatten[Outer[d, pts, pts]]] >>> >>> >>> >>>dimmechan at yahoo.com wrote: >>> >>> >>>>In the book of Gaylord et al. (1996) there is one exercise which >>>>asks (see page 113) >>>> >>>>"Given a list of points in the plane, write a function that finds >>>>the set of all distances between the points." >>>> >>>>Although there is one solution, that solution makes use of the Table >>>> >>>> > > > >>>>and Length commands. >>>> >>>>Is it a way to define the same function using Higher-Order functions >>>> >>>> > > > >>>>like Outer, MapThread etc? >>>> >>>>Thanks in advance for any help. >>>> >>>> >>>> >>>> >> >> > > >