RE: Is Sort stable?
- To: mathgroup at smc.vnet.net
- Subject: [mg39742] RE: [mg39725] Is Sort stable?
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Wed, 5 Mar 2003 00:04:53 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message----- >From: Roland Nilsson [mailto:rolle at ifm.liu.se] To: mathgroup at smc.vnet.net >Sent: Tuesday, March 04, 2003 5:50 AM >To: mathgroup at smc.vnet.net >Subject: [mg39742] [mg39725] Is Sort stable? > > >Hi, > >--Short form: >Is the sorting algorithm implemented by Sort[] stable? > >--Long form: >I'm doing a thingy where I need to take out subsets of data >and look at them >individually. I have a labeled data set, with labels being a >vector of 1,2 >... denoting class membership, and I've tried to pick out subset using >either e.g. > >Extract[data, Position[labels,1]] >=> the data points in "class 1" > >or using Sort to sort data according to labels (so I get e.g. >{1,1,1,1, ... >2,2,2, ... 3,3,3}, and which is nice for plotting data sets. >This work ok, >but it seems like Sort[] is rearranging the data points within classes >different from Extract[]. Does anyone know i Is the Sort[] >algorithm stable? >Could there be something else lurking here? > > Regards, > >-- >--------------------------------------------- >Roland Nilsson >IFM Computational Biology >Linköping University, Sweden >rolle at ifm.liu.se > > > Roland, Let's compare several forms of Sort in Mathematica. First make up some test data: In[1]:= s = Table[{Random[Integer, {1, 3}], Random[Integer, {1, 10}]}, {30}] Out[1]= {{2, 10}, {2, 2}, {1, 8}, {2, 4}, {2, 8}, {1, 4}, {2, 5}, {1, 1}, {1, 1}, {2, 6}, {2, 3}, {3, 3}, {1, 5}, {3, 5}, {1, 9}, {3, 5}, {3, 2}, {1, 1}, {3, 4}, {1, 8}, {2, 1}, {3, 4}, {2, 9}, {3, 10}, {1, 10}, {2, 5}, {3, 1}, {1, 2}, {2, 4}, {2, 10}} Let the first component be the label, and the second component be the data (or payload). In[2]:= ss1 = Sort[s] Out[2]= {{1, 1}, {1, 1}, {1, 1}, {1, 2}, {1, 4}, {1, 5}, {1, 8}, {1, 8}, {1, 9}, {1, 10}, {2, 1}, {2, 2}, {2, 3}, {2, 4}, {2, 4}, {2, 5}, {2, 5}, {2, 6}, {2, 8}, {2, 9}, {2, 10}, {2, 10}, {3, 1}, {3, 2}, {3, 3}, {3, 4}, {3, 4}, {3, 5}, {3, 5}, {3, 10}} Labels and payload are sorted into canonical order. The question of stability doesn't arise. Other forms of sort are possible, to better compare them, let's first make a list of ranges for sorted labels: In[3]:= ix = FoldList[Plus, 0, Length /@ Split @ Sort[s[[All, 1]]]]; In[4]:= rr = Transpose[{Drop[ix + 1, -1], Rest[ix]}] Out[4]= {{1, 10}, {11, 22}, {23, 30}} In[5]:= classes[sort_] := Flatten[Take[sort, #, {-1}]] & /@ rr A function to get the classes. This not the best, but a secure way to get a stable sort: In[6]:= ssstable = Join @@ (Cases[s, {#, _}] &) /@ Union[s[[All, 1]]] Out[6]= {{1, 8}, {1, 4}, {1, 1}, {1, 1}, {1, 5}, {1, 9}, {1, 1}, {1, 8}, {1, 10}, {1, 2}, {2, 10}, {2, 2}, {2, 4}, {2, 8}, {2, 5}, {2, 6}, {2, 3}, {2, 1}, {2, 9}, {2, 5}, {2, 4}, {2, 10}, {3, 3}, {3, 5}, {3, 5}, {3, 2}, {3, 4}, {3, 4}, {3, 10}, {3, 1}} In[7]:= stableclasses = classes[ssstable] Out[7]= {{8, 4, 1, 1, 5, 9, 1, 8, 10, 2}, {10, 2, 4, 8, 5, 6, 3, 1, 9, 5, 4, 10}, {3, 5, 5, 2, 4, 4, 10, 1}} These are the stable classes, ss1 had them sorted too (which you didn't like). In[8]:= classes[ss1] Out[8]= {{1, 1, 1, 2, 4, 5, 8, 8, 9, 10}, {1, 2, 3, 4, 4, 5, 5, 6, 8, 9, 10, 10}, {1, 2, 3, 4, 4, 5, 5, 10}} In Mathematica we may sort only according to the labels (the first component here) In[9]:= ss2 = Sort[s, First[#1] < First[#2] &]; In[10]:= classes[ss2] Out[10]= {{2, 10, 8, 1, 9, 5, 1, 1, 4, 8}, {10, 4, 5, 9, 1, 3, 6, 5, 8, 4, 2, 10}, {1, 10, 4, 4, 2, 5, 5, 3}} In[11]:= % === stableclasses Out[11]= False But that also isn't stable. Let's check the case when using the ordering of the labels only In[12]:= ss3 = s[[Ordering[s[[All, 1]] ] ]]; In[13]:= classes[ss3] Out[13]= {{8, 1, 1, 10, 4, 8, 5, 2, 1, 9}, {10, 1, 6, 9, 4, 4, 3, 5, 2, 10, 5, 8}, {3, 5, 5, 10, 4, 1, 2, 4}} In[14]:= % === stableclasses Out[14]= False In[15]:= %% === classes[ss2] Out[15]= False How do we get a performant stable sort? An idea is to tag the payload with their ordering and use simple Sort: In[16]:= s4 = Transpose[{s[[All, 1]], Range[Length[s]], s[[All, -1]]}]; In[17]:= ss4 = Sort[s4][[All, {1, -1}]] Out[17]= {{1, 8}, {1, 4}, {1, 1}, {1, 1}, {1, 5}, {1, 9}, {1, 1}, {1, 8}, {1, 10}, {1, 2}, {2, 10}, {2, 2}, {2, 4}, {2, 8}, {2, 5}, {2, 6}, {2, 3}, {2, 1}, {2, 9}, {2, 5}, {2, 4}, {2, 10}, {3, 3}, {3, 5}, {3, 5}, {3, 2}, {3, 4}, {3, 4}, {3, 10}, {3, 1}} In[18]:= classes[ss4] Out[18]= {{8, 4, 1, 1, 5, 9, 1, 8, 10, 2}, {10, 2, 4, 8, 5, 6, 3, 1, 9, 5, 4, 10}, {3, 5, 5, 2, 4, 4, 10, 1}} In[19]:= % === stableclasses Out[19]= True -- Hartmut Wolf