RE: "Sloppy Union"? (Union of a list with *nearly* equ
- To: mathgroup at smc.vnet.net
- Subject: [mg41933] RE: [mg41892] "Sloppy Union"? (Union of a list with *nearly* equ
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Wed, 11 Jun 2003 03:49:13 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message----- >From: AES/newspost [mailto:siegman at stanford.edu] To: mathgroup at smc.vnet.net >Sent: Sunday, June 08, 2003 12:46 PM >To: mathgroup at smc.vnet.net >Subject: [mg41933] [mg41892] "Sloppy Union"? (Union of a list with *nearly* equal >elements) > > >I'm trying to make a "Sloppy Union" that will eliminate elements of a >list that are *almost* but not precisely duplicates of each other; and >Union[ ] has the "SameTest->test" option which looks like what I need. > >I don't understand, however, why the following approach seems to work > > Remove["Global`*"] > > sloppyData = Table[Random[Integer, {1000,1005}] > + Random[Real, {-0.01,0.01}], {20}] > > Union[sloppyData, SameTest -> (Abs[#1-#2]<10.^-1&)] > >but the following doesn't > > Remove["Global`*"] > > sloppyData = Table[Random[Integer, {1000,1005}] > + Random[Real, {-0.01,0.01}], {20}] > > sharperData = SetPrecision[sloppyData, 4] > > Union[sharperData, SameTest -> (Abs[#1-#2]<10.^-1&)] > >--even though shaperData seems to have reduced precision. > > >P.S. to Wolfram: Although "SameTest->test" is listed as an option in >the online Help, neither "SameTest" or "test" are explained in the >online Help system, and the only example of SameTest given in The >Mathematica Book has the syntax "SameText -> comp" rather than >"SameTest->test". Are "test" and "comp" the same? > >-- >"Power tends to corrupt. Absolute power corrupts absolutely." >Lord Acton (1834-1902) >"Dependence on advertising tends to corrupt. Total dependence on >advertising corrupts totally." (today's equivalent) > It merits to be a bit more verbose and include the data: In[50]:= sloppyData = Table[Random[Integer, {1000, 1005}] + Random[Real, {-0.01, 0.01}], {20}] Out[50]= {999.997, 1005., 1005., 1005., 1004., 1004.01, 999.993, 1004., 1001.01, 999.999, 1003., 1001.01, 1001., 1004.01, 1003., 1005.01, 1003., 1003., 1004., 1001.99} In[51]:= % // Sort Out[51]= {999.993, 999.997, 999.999, 1001., 1001.01, 1001.01, 1001.99, 1003., 1003., 1003., 1003., 1004., 1004., 1004., 1004.01, 1004.01, 1005., 1005., 1005., 1005.01} Just to our eyes, to compare. In[52]:= Union[sloppyData, SameTest -> (Abs[#1 - #2] < 10.^-1 &)] Out[52]= {999.993, 1001., 1001.99, 1003., 1004., 1005.} This is what you accepted for represents of sets of *nearly* equal data. We may look at the individual differences: In[53]:= (Abs[#1 - #2] < 10.^-1 &) @@@ Partition[Sort[sloppyData], 2, 1] Out[53]= {True, True, False, True, True, False, False, True, True, True, False, True, True, True, True, False, True, True, True} Next to any False (plus at the beginning) there begins a run of data with small distances *nearly nothing* In[54]:= Flatten[Join[{1}, Position[%, False] + 1]] Out[54]= {1, 4, 7, 8, 12, 17} In[55]:= Sort[sloppyData][[%]] Out[55]= {999.993, 1001., 1001.99, 1003., 1004., 1005.} This (here) gives the same answer as Union. In fact this is like: In[56]:= First /@ Split[Sort[sloppyData], (Abs[#1 - #2] < 10.^-1 &)] Out[56]= {999.993, 1001., 1001.99, 1003., 1004., 1005.} In many cases this comes out as the same as Union (and is much more performant on large data), however not, if there appear long runs of *nearly* equal data, such that the beginning and the end of a run are no longer *nearly* equal. (You have to check for your real-world process, of what is approriate. This also includes the question of the right representative, as not to get some bias.) No we come to your less precise data: In[57]:= sharperData = SetPrecision[sloppyData, 4] Out[57]= {1000., 1005., 1005., 1005., 1004., 1004., 1000., 1004., 1001., 1000., 1003., 1001., 1001., 1004., 1003., 1005., 1003., 1003., 1004., 1002.} In[58]:= Union[sharperData, SameTest -> (Abs[#1 - #2] < 10.^-1 &)] Out[58]= {1000., 1000., 1000., 1001., 1001., 1001., 1002., 1003., 1003., 1003., 1003., 1004., 1004., 1004., 1004., 1004., 1005., 1005., 1005., 1005.} ...we might get the idea In[59]:= (Abs[#1 - #2] < 10.^-1 &) @@@ Partition[Sort[sharperData], 2, 1] Out[59]= {False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False} Mathematica doesn't consider these numbers as different, but -- due to unsufficient Precision -- cannot consider them Less! (for each successive pair). Things change dramatically if we check for LessOrEqual: In[60]:= (Abs[#1 - #2] <= 10.^-1 &) @@@ Partition[Sort[sharperData], 2, 1] Out[60]= {True, True, False, True, True, False, False, True, True, True, False, True, True, True, True, False, True, True, True} And now (of course) we get: In[61]:= Union[sharperData, SameTest -> (Abs[#1 - #2] <= 10.^-1 &)] Out[61]= {1000., 1001., 1002., 1003., 1004., 1005.} -- Hartmut Wolf