MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: SameTest in Union

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61184] Re: [mg61106] SameTest in Union
  • From: "Carl K. Woll" <carl at woll2woll.com>
  • Date: Wed, 12 Oct 2005 01:42:10 -0400 (EDT)
  • References: <200510100640.CAA26942@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Jack Goldberg wrote:
> Hi Folks,
>
> I know there is a simple answer to this question but the help index
> idoes not help!
>
> I have a list, something like this:
>
> lst = {1.1101, 1.11095, 1.11076, 1.09, 2.3523, 2.352. 2.35211}
>
> I want to remove from the list those entries which are near each
> other but not identical, leaving only one representative for each of
> these numbers.  One approach is to use  Union with the option
> SameTest->???.   Here the same test might be that the difference
> between entries is less than, say 10^(-2).   But I can't seem to get
> SameTest to work.  So, what I want is
>
> Union[ lst, SameTest- > ?]
>
> so that  the union returns
>
> {1.1101,  2.35211}
>
> Here, I chose 2 representatives.  Any other choice is OK;   say,
>
> {1.11095,  2.352}
>
> is also satisfactory.
>
> There may be other ways to do this, but I thought of  Union  first.
> Perhaps, Cases  or Select  might be better.  Any help is appreciated.
>
> Jack
>

Jack,

One other possibility to consider besides using Union is the function 
FindClusters from the package Statistics`ClusterAnalysis`. FindClusters has 
the usage message:

In[2]:=
?FindClusters
FindClusters[{e1, e2, ... }] partitions the ei into clusters of similar \
elements. FindClusters[{e1 -> v1, e2->v2, ... }] returns the vi 
corresponding \
to the ei in each cluster. FindClusters[{e1, e2, ... } -> {v1, v2, ... }] \
gives the same result. FindClusters[{e1, e2, ... }, n] partitions the ei 
into \
exactly n clusters.

For your example we have:

In[4]:=
FindClusters[lst]
Out[4]=
{{1.1101, 1.11095, 1.11076, 1.09}, {2.3523, 2.352, 2.35211}}

FindClusters can handle a wide variety of input, including boolean and 
string data as well as numerical vectors (or even numerical tensors). It 
also has DistanceFunction and Method options that you could play around 
with. The only downside to using FindClusters is that it may be slow for 
large data sets.

Carl Woll
Wolfram Research 




  • Prev by Date: Re: Language vs. Library
  • Next by Date: Re: Using MathLink to create a GUI
  • Previous by thread: Re: SameTest in Union
  • Next by thread: Re: SameTest in Union