R: Re: reliably sort?
- To: mathgroup at smc.vnet.net
- Subject: [mg123767] R: Re: reliably sort?
- From: "Sir Bob Knight" <sir.bob.knight at gmail.com>
- Date: Sun, 18 Dec 2011 04:35:21 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jca06e$a35$1@smc.vnet.net> <201112150955.EAA22951@smc.vnet.net> <201112161054.FAA06989@smc.vnet.net> <op.v6lg231jtgfoz2@bobbys-imac.local> <201112170744.CAA18852@smc.vnet.net>
I agree with you Ulrich, When we use Union we are led to think we are working on sets and thus we expect to have the result in the same order as in input. On the other hand, the reference page of the Union function says: "If the Subscript[list, i] are considered as sets, Union gives their union." (first line in the MORE INFORMATION section). Moreover, if you take a look at the tutorial/ListsAsSets (as suggested by Shizu in [mg123735]) it lists the Union function (and some others) in the first table named "Set theoretical functions", but, frankly, theoretical union doesn't consider ordering results. Finally, just some tests from the performance point of view: In[36]:= data = RandomInteger[{-100, 100}, {10^8}]; In[37]:= AbsoluteTiming[r = Sort[DeleteDuplicates[data]];] Out[37]= {0.1880107, Null} In[38]:= AbsoluteTiming[ru = Union[data];] Out[38]= {40.0252894, Null} In[39]:= r === ru Out[39]= True So, Union seems to have no reason to be used even from the performance point of view (40 times slower than the other). If you need the "union" as set operation use DeleteDuplicates[set], if you need a "sorted union" you can use Sort[DeleteDuplicates[set]] My personal opinion is that Union is correct even sorting the result, because the reference page says that, and in general I think that if a function behaves coherently with the documented features it works fine. However, the documentation should be a little more accurate when speaking about sets and union operation on sets, perhaps specifying that the sorting of result is not part of the theoretical function. Regards, ~roberto -----Messaggio originale----- Da: Ulrich Arndt [mailto:ulrich.arndt at data2knowledge.de] Inviato: sabato 17 dicembre 2011 08:44 A: mathgroup at smc.vnet.net Oggetto: Re: reliably sort? My statement might have been a bit misleading. I didn't expected that mathematica is offering a pure mathematica set concept implementation and I saw also some ways how it can be build in mathematica. But Union (Esc\[ThinSpace]un\[ThinSpace]Esc:) is indicating by its notation a set operation and there I was wondering why the function is overloaded with the additional sort. The other thing which was puzzling me was the statement of "Mathematica obviously uses the most efficient way to perform the Union operation, which is to sort the entire set of elements..." which is performance wise not true for all sets as shown in the example. As I learned today DeleteDuplicates would have been a good candidate for the Union Implementation... Ulrich Am 16.12.2011 um 19:17 schrieb DrMajorBob: > Union COULD be a set operation... if sets existed in Mathematica. The output may LOOK like a set to you, but that's a mirage. > >> But there are other implementations which can be used for this set >> operation which do not use sorting > > Ah, but you DID sort b (in order to make the results equal), whereas Union sorted the list before eliminating duplicates. > > Yes, other implementations exist that don't involve sorting... but they're not built-in to Mathematica, and you haven't demonstrated one. > > Bobby > > On Fri, 16 Dec 2011 04:54:15 -0600, Ulrich Arndt <ulrich.arndt at data2knowledge.de> wrote: > >> Union and Sort seems not naturally connected to me. Union is a set >> operation and according to the documentation it seems that the >> implementation is using some sorting... >> But there are other implementations which can be used for this set >> operation which do not use sorting and seems to have some better >> performance in some cases - will clearly depend on the underlying data! >> >> In[169]: dim = 1000000; >> list = RandomChoice[Range[299], dim]; (a = Union[list]); // Timing (b >> = Tally[list][[All, 1]]); // Timing a == Sort[b] >> >> Out[171]= {0.222803, Null} >> >> Out[172]= {0.003932, Null} >> >> Out[173]= True >> >> Ulrich >> >> Am 15.12.2011 um 10:55 schrieb David Bailey: >> >>> On 14/12/2011 11:10, Michael Stern wrote: >>>> I have noticed that if combine two lists of date/data objects using >>>> Union[], the result is sorted by date. >>>> >>>> For example >>>> >>>> In[]:= mins={{{2001, 1, 31}, 0.993268}, {{2002, 3, 31}, 1.01395}, >>>> {{2003, 6, 30}, >>>> 1.08647}, {{2005, 11, 30}, 1.14752}, {{2006, 9, 30}, >>>> 1.18938}, {{2007, 2, 28}, 1.19658}, {{2008, 1, 31}, >>>> 1.20432}, {{2011, 1, 31}, 1.37501}}; In[]:= maxs={{{2000, 10, >>>> 31}, 1.01816}, {{2001, 12, 31}, >> 1.02714}, >>>> {{2004, 2, 29}, >>>> 1.12702}, {{2005, 3, 31}, 1.16986}, {{2010, 10, 31}, 1.39026}}; >>>> >>>> In[]:= Union[maxs, mins] >>>> >>>> Out[]= {{{2000, 10, 31}, 1.01816}, {{2001, 1, 31}, >>>> 0.993268}, {{2001, 12, 31}, 1.02714}, {{2002, 3, 31}, >>>> 1.01395}, {{2003, 6, 30}, 1.08647}, {{2004, 2, 29}, >>>> 1.12702}, {{2005, 3, 31}, 1.16986}, {{2005, 11, 30}, >>>> 1.14752}, {{2006, 9, 30}, 1.18938}, {{2007, 2, 28}, >>>> 1.19658}, {{2008, 1, 31}, 1.20432}, {{2010, 10, 31}, >>>> 1.39026}, {{2011, 1, 31}, 1.37501}} >>>> >>>> Is this reliable behavior? Is there any case in which Union would /not/ >>>> return results that have been properly sorted by date? >>>> >>>> Thank you, >>>> >>>> -Michael Stern >>> Sorting is part of the operation, and you can rely on the result being >>> sorted. >>> >>> Mathematica obviously uses the most efficient way to perform the Union >>> operation, which is to sort the entire set of elements, and then perform >>> a scan through the elements, discarding adjacent duplicates. >>> >>> David Bailey >>> http://www.dbaileyconsultancy.co.uk >>> >>> >> >> > > > -- > DrMajorBob at yahoo.com
- References:
- Re: Does Union[] reliably sort?
- From: David Bailey <dave@removedbailey.co.uk>
- Re: reliably sort?
- From: Ulrich Arndt <ulrich.arndt@data2knowledge.de>
- Re: reliably sort?
- From: Ulrich Arndt <ulrich.arndt@data2knowledge.de>
- Re: Does Union[] reliably sort?