MathGroup Archive: June 2003 [00244]

[Date Index] [Thread Index] [Author Index]

RE: "Sloppy Union"? (Union of a list with *nearly* equ

To: mathgroup at smc.vnet.net
Subject: [mg41933] RE: [mg41892] "Sloppy Union"? (Union of a list with *nearly* equ
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Wed, 11 Jun 2003 03:49:13 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

>-----Original Message-----
>From: AES/newspost [mailto:siegman at stanford.edu]
To: mathgroup at smc.vnet.net
>Sent: Sunday, June 08, 2003 12:46 PM
>To: mathgroup at smc.vnet.net
>Subject: [mg41933] [mg41892] "Sloppy Union"? (Union of a list with *nearly* equal
>elements)
>
>
>I'm trying to make a "Sloppy Union" that will eliminate elements of a 
>list that are *almost* but not precisely duplicates of each other; and 
>Union[ ] has the "SameTest->test" option which looks like what I need.
>
>I don't understand, however, why the following approach seems to work
>
>    Remove["Global`*"]
>
>    sloppyData = Table[Random[Integer, {1000,1005}]
>                                + Random[Real, {-0.01,0.01}], {20}]
>
>    Union[sloppyData, SameTest -> (Abs[#1-#2]<10.^-1&)]
>
>but the following doesn't
>
>  Remove["Global`*"]
>
>  sloppyData = Table[Random[Integer, {1000,1005}]
>                                + Random[Real, {-0.01,0.01}], {20}]
>
>  sharperData = SetPrecision[sloppyData, 4]
>
>  Union[sharperData, SameTest -> (Abs[#1-#2]<10.^-1&)]
>
>--even though shaperData seems to have reduced precision.
>
>
>P.S. to Wolfram:   Although "SameTest->test" is listed as an option in 
>the online Help, neither "SameTest" or "test" are explained in the 
>online Help system, and the only example of SameTest given in The 
>Mathematica Book has the syntax  "SameText -> comp"  rather than 
>"SameTest->test".  Are "test" and "comp" the same?
>
>-- 
>"Power tends to corrupt.  Absolute power corrupts absolutely."  
>Lord Acton (1834-1902)
>"Dependence on advertising tends to corrupt.  Total dependence on 
>advertising  corrupts totally." (today's equivalent)  
>


It merits to be a bit more verbose and include the data:
 
In[50]:= sloppyData = 
  Table[Random[Integer, {1000, 1005}] + Random[Real, {-0.01, 0.01}], {20}]
Out[50]=
{999.997, 1005., 1005., 1005., 1004., 1004.01, 999.993, 1004., 1001.01, 
999.999, 1003., 1001.01, 1001., 1004.01, 1003., 1005.01, 1003., 1003.,
1004., 
1001.99}

In[51]:= % // Sort
Out[51]=
{999.993, 999.997, 999.999, 1001., 1001.01,
 1001.01, 1001.99, 1003., 1003., 1003.,
 1003., 1004., 1004., 1004., 1004.01,
 1004.01, 1005., 1005., 1005., 1005.01}

Just to our eyes, to compare.

In[52]:= Union[sloppyData, SameTest -> (Abs[#1 - #2] < 10.^-1 &)]
Out[52]= {999.993, 1001., 1001.99, 1003., 1004., 1005.}

This is what you accepted for represents of sets of *nearly* equal data. 

We may look at the individual differences:

In[53]:= (Abs[#1 - #2] < 10.^-1 &) @@@ Partition[Sort[sloppyData], 2, 1]
Out[53]=
{True, True, False, True, True,
 False, False, True, True, True, 
 False, True, True, True, True,
 False, True, True, True}

Next to any False (plus at the beginning) there begins a run of data with
small distances *nearly nothing*

In[54]:= Flatten[Join[{1}, Position[%, False] + 1]]
Out[54]= {1, 4, 7, 8, 12, 17}

In[55]:= Sort[sloppyData][[%]]
Out[55]= {999.993, 1001., 1001.99, 1003., 1004., 1005.}

This (here) gives the same answer as Union. In fact this is like:

In[56]:= First /@ Split[Sort[sloppyData], (Abs[#1 - #2] < 10.^-1 &)]
Out[56]= {999.993, 1001., 1001.99, 1003., 1004., 1005.}

In many cases this comes out as the same as Union (and is much more
performant on large data), however not, if there appear long runs of
*nearly* equal data, such that the beginning and the end of a run are no
longer *nearly* equal. (You have to check for your real-world process, of
what is approriate. This also includes the question of the right
representative, as not to get some bias.)


No we come to your less precise data:

In[57]:= sharperData = SetPrecision[sloppyData, 4]
Out[57]=
{1000., 1005., 1005., 1005., 1004.,
 1004., 1000., 1004., 1001., 1000.,
 1003., 1001., 1001., 1004., 1003.,
 1005., 1003., 1003., 1004., 1002.}


In[58]:= Union[sharperData, SameTest -> (Abs[#1 - #2] < 10.^-1 &)]
Out[58]=
{1000., 1000., 1000., 1001., 1001.,
 1001., 1002., 1003., 1003., 1003.,
 1003., 1004., 1004., 1004., 1004.,
 1004., 1005., 1005., 1005., 1005.}

...we might get the idea

In[59]:= 
(Abs[#1 - #2] < 10.^-1 &) @@@ Partition[Sort[sharperData], 2, 1]
Out[59]=
{False, False, False, False, False,
 False, False, False, False, False,
 False, False, False, False, False,
 False, False, False, False}

Mathematica doesn't consider these numbers as different, but -- due to
unsufficient Precision -- cannot consider them Less! (for each successive
pair).

Things change dramatically if we check for LessOrEqual:

In[60]:=
(Abs[#1 - #2] <= 10.^-1 &) @@@ Partition[Sort[sharperData], 2, 1]
Out[60]=
{True, True, False, True, True,
 False, False, True, True, True,
 False, True, True, True, True,
 False, True, True, True}


And now (of course) we get:

In[61]:=
Union[sharperData, SameTest -> (Abs[#1 - #2] <= 10.^-1 &)]
Out[61]= {1000., 1001., 1002., 1003., 1004., 1005.}


--
Hartmut Wolf

Prev by Date: Re: Mathlink performance and task switches

Next by Date: Averaging

Previous by thread: Re: NonlinearFit with NIntegrate, BesselJ and Normal Distribution

Next by thread: Re: "Sloppy Union"? (Union of a list with *nearly* equ