Re: SameTest in Union
- To: mathgroup at smc.vnet.net
- Subject: [mg61184] Re: [mg61106] SameTest in Union
- From: "Carl K. Woll" <carl at woll2woll.com>
- Date: Wed, 12 Oct 2005 01:42:10 -0400 (EDT)
- References: <200510100640.CAA26942@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Jack Goldberg wrote: > Hi Folks, > > I know there is a simple answer to this question but the help index > idoes not help! > > I have a list, something like this: > > lst = {1.1101, 1.11095, 1.11076, 1.09, 2.3523, 2.352. 2.35211} > > I want to remove from the list those entries which are near each > other but not identical, leaving only one representative for each of > these numbers. One approach is to use Union with the option > SameTest->???. Here the same test might be that the difference > between entries is less than, say 10^(-2). But I can't seem to get > SameTest to work. So, what I want is > > Union[ lst, SameTest- > ?] > > so that the union returns > > {1.1101, 2.35211} > > Here, I chose 2 representatives. Any other choice is OK; say, > > {1.11095, 2.352} > > is also satisfactory. > > There may be other ways to do this, but I thought of Union first. > Perhaps, Cases or Select might be better. Any help is appreciated. > > Jack > Jack, One other possibility to consider besides using Union is the function FindClusters from the package Statistics`ClusterAnalysis`. FindClusters has the usage message: In[2]:= ?FindClusters FindClusters[{e1, e2, ... }] partitions the ei into clusters of similar \ elements. FindClusters[{e1 -> v1, e2->v2, ... }] returns the vi corresponding \ to the ei in each cluster. FindClusters[{e1, e2, ... } -> {v1, v2, ... }] \ gives the same result. FindClusters[{e1, e2, ... }, n] partitions the ei into \ exactly n clusters. For your example we have: In[4]:= FindClusters[lst] Out[4]= {{1.1101, 1.11095, 1.11076, 1.09}, {2.3523, 2.352, 2.35211}} FindClusters can handle a wide variety of input, including boolean and string data as well as numerical vectors (or even numerical tensors). It also has DistanceFunction and Method options that you could play around with. The only downside to using FindClusters is that it may be slow for large data sets. Carl Woll Wolfram Research
- References:
- SameTest in Union
- From: Jack Goldberg <jackgoldberg@comcast.net>
- SameTest in Union