MathGroup Archive: October 2005 [00335]

[Date Index] [Thread Index] [Author Index]

Re: SameTest in Union

To: mathgroup at smc.vnet.net
Subject: [mg61184] Re: [mg61106] SameTest in Union
From: "Carl K. Woll" <carl at woll2woll.com>
Date: Wed, 12 Oct 2005 01:42:10 -0400 (EDT)
References: <200510100640.CAA26942@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Jack Goldberg wrote:
> Hi Folks,
>
> I know there is a simple answer to this question but the help index
> idoes not help!
>
> I have a list, something like this:
>
> lst = {1.1101, 1.11095, 1.11076, 1.09, 2.3523, 2.352. 2.35211}
>
> I want to remove from the list those entries which are near each
> other but not identical, leaving only one representative for each of
> these numbers.  One approach is to use  Union with the option
> SameTest->???.   Here the same test might be that the difference
> between entries is less than, say 10^(-2).   But I can't seem to get
> SameTest to work.  So, what I want is
>
> Union[ lst, SameTest- > ?]
>
> so that  the union returns
>
> {1.1101,  2.35211}
>
> Here, I chose 2 representatives.  Any other choice is OK;   say,
>
> {1.11095,  2.352}
>
> is also satisfactory.
>
> There may be other ways to do this, but I thought of  Union  first.
> Perhaps, Cases  or Select  might be better.  Any help is appreciated.
>
> Jack
>

Jack,

One other possibility to consider besides using Union is the function 
FindClusters from the package Statistics`ClusterAnalysis`. FindClusters has 
the usage message:

In[2]:=
?FindClusters
FindClusters[{e1, e2, ... }] partitions the ei into clusters of similar \
elements. FindClusters[{e1 -> v1, e2->v2, ... }] returns the vi 
corresponding \
to the ei in each cluster. FindClusters[{e1, e2, ... } -> {v1, v2, ... }] \
gives the same result. FindClusters[{e1, e2, ... }, n] partitions the ei 
into \
exactly n clusters.

For your example we have:

In[4]:=
FindClusters[lst]
Out[4]=
{{1.1101, 1.11095, 1.11076, 1.09}, {2.3523, 2.352, 2.35211}}

FindClusters can handle a wide variety of input, including boolean and 
string data as well as numerical vectors (or even numerical tensors). It 
also has DistanceFunction and Method options that you could play around 
with. The only downside to using FindClusters is that it may be slow for 
large data sets.

Carl Woll
Wolfram Research

References:
- SameTest in Union
  - From: Jack Goldberg <jackgoldberg@comcast.net>

Prev by Date: Re: Language vs. Library

Next by Date: Re: Using MathLink to create a GUI

Previous by thread: Re: SameTest in Union

Next by thread: Re: SameTest in Union