[Date Index] [Thread Index] [Author Index]
Re: prograMing: EquivalenceIndex

To: mathgroup@smc.vnet.net
Subject: [mg11213] Re: prograMing: EquivalenceIndex
From: "Xah" <xah@best.com>
Date: Mon, 2 Mar 1998 23:11:04 -0500
Organization: Venus & Xah Love Factory
I've spend about 6 hours studying the EquivalenceIndex problem. I have
found a solution I deem perfect. It is clear, purely functional, and I
doubt any other code can be faster. I'll come to it below.

In the process, I discovered an important design issue which I missed
before. Namely, that user can pass a sameTestQ such that
sameTestQ[1,1]===False. For example, if sameTestQ is
(#1===Reverse@#2&), then sameTestQ[{1,2},{1,2}] returns False. ALL
previous implementations of EquivalenceIndex are susceptable to
incorrect results when such sameTestQ is given. For example, all
versions give wrong results in

  EquivalenceIndex[{{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1},
   {1,2,3}},#1===Reverse@#2&]

For sometime I wasn't sure whether EquivalenceIndex should assume
identical elements being equal regardless what sameTestQ says. In the
end I decided that user's sameTestQ should take complete control. This
philosophy is in agreement with the behavior of Union. For example,
Union[{{2,1,3,4},{2,1,3,4}},SameTest->(#1===Reverse@#2&)] returns the
argument unchanged. (By now this seems an obvious decision)

Before I give my new solutions of EquivalenceIndex, let me give a bit of
background. (which I posted a few months ago here) The problem starts
with Union's inability to truely return distinct elements with user
defined SameTest. For example, the following:

Union[{{2,1,3,4},{3,4,1,2},{4,3,1,2}},SameTest->(#1===#2||#1===Reverse@#2&)]

should return {{2,1,3,4},{3,4,1,2}} but it dosn't. The cause is that
Union only uses SameTest to compare neighboring elements under standard
Sort. For a workaround, I wrote DistinctElements, in which
EquivalenceIndex is a generalization:

DistinctElements[li_,sameTestQ_:SameQ]:=
  Part[li,First/@EquivalenceIndex[li,sameTestQ]];

I felt there is a general need for DistinctElements and
EquivalenceIndex. Daniel Lichtblau back then hinted that future version
of Mathematica may "fix" Union.

Now, here is the killer solution of EquivalenceIndex. Thanks to Clemens
Frey (<Clemens.Frey@uni-bayreuth.de>) whoes idea of using Position I've
borrowed.

Clear[EquivalenceIndex];
EquivalenceIndex::usage=
  "EquivalenceIndex[listA,sameTestQ] returns a partition of indexes,
e.g.  {{1,4,7},{2},{3,10},{5,8},{6,9}}, of listA. Index of elements
that are  considered equivalent under sameTestQ are grouped together.
sameTestQ must accept two arguments and return either True or False.
ListA can have head  other than List. To guarantee that
sameTestQ[listA[[e]],listA[[f]]]===False  for any e,f in the result
index grouping, sameTestQ must be reflective,  orderless, and
transitive: (1) sameTestQ[a,a]===True, (2) 
sameTestQ[a,b]===sameTestQ[b,a], (3) if sameTestQ[a,b], sameTestQ[b,c],
then

sameTestQ[a,c]. Exapmle: 
EquivalenceIndex[Permutations[Range@3],#1===#2||#1===Reverse@#2&]";

(*purely functional implementation, and fastest. The killer solution!*)

EquivalenceIndex[li_,sameTestQ_:SameQ]:=First@FixedPoint[
  With[{resultA=First@#,firstIndexes=#[[-1,1]],restIndexes=Rest@Last@#,
            firstPart=li[[#[[-1,1]]]]},
   ({Append[resultA,Join[{firstIndexes},Part[restIndexes,Flatten@#]]],
                  Delete[restIndexes,#]}&)@
            Position[(sameTestQ[firstPart,#]&)/@Part[li,restIndexes],
              True,{1}]
  ]&,
  {{},Range@Length@li},
  SameTest->(Last@#2==={}&)
 ];

(*alternative, standard imperative implementation. Easy to read.
Improved from previous implementations, and fixed sameTestQ bug. *)

Clear[EquivalenceIndex2];
EquivalenceIndex2[li_,sameTestQ_:SameQ]:=
  Module[{indexes,resultA,tempHead,firstPart},
  indexes=Range@Length@li;
  resultA={};
  While[Length@indexes=!=0,
   firstPart=Part[li,First@indexes];
   resultA={resultA,
          tempHead[First@indexes,
           
Cases[Rest@indexes,_?(sameTestQ[firstPart,Part[li,#]]&),{1}]]};
   indexes=Complement[Rest@indexes,(resultA[[-1,-1]])]];
  DeleteCases[{resultA/.List->Sequence}/.tempHead->List,{},{1}]];

--

For the industrious, here are some testing code and results.

In[6]:=
Clear[li,sameTestQ,functions];
functions={EquivalenceIndex,EquivalenceIndex2};
li=(Join[Permutations@#,{#}]&)@Range@6; sameTestQ=(#1===Reverse@#2&);
({SameQ@@(Last/@#),First/@#}&)@((Timing@#[li,sameTestQ]&)/@functions)

Out[6]=
{True,{26.1 Second,34.7 Second}}

In[7]:=
li=Table[Random[Integer,{1,500}],{500}]; sameTestQ=SameQ;
({SameQ@@(Last/@#),First/@#}&)@((Timing@#[li,sameTestQ]&)/@functions)

Out[7]=
{True,{6.63333 Second,11.7 Second}}

--

For those truely adventurous, here's a syntax variation of the purely
functional version. It is faster by a few fractions of a second for
lengthy inputs (e.g. li=Table[Random[Integer,{1,500}],{500}],
sameTestQ=SameQ). Instead of using a With construct, it avoided With by
nesting multiple-argument-pure-functions. It is faster because With
takes time.

Clear[EquivalenceIndex2];
EquivalenceIndex2[li_,sameTestQ_:SameQ]:=First@FixedPoint[
 
Function[({Append[#1,Join[{#2},Part[#3,Flatten@#4]]],Delete[#3,#4]}&)[Sequen
ce@@(Function[{x,y,z},{First@x,x[[-1,1]],z,Position[(sameTestQ[y,#]&)/@Part[
li,z],True,{1}]}][#,li[[#[[-1,1]]]],Rest@Last@#])]],
  {{},Range@Length@li},
  SameTest->(Last@#2==={}&)
 ];

If anyone can come up with still a faster solution, **I'll buy her a
drink with awe**.

PS Here's another version, which is essentially a fixed and improved
version of Clemens Frey's EquivalenceIndexCF2.

Clear[EquivalenceIndex2];
EquivalenceIndex2[li_,sameTestQ_:SameQ]:=
 
Module[{resultA,currentList,restcurrentList,firstPart,indexes,tempHead},
  resultA={};
  currentList=Transpose@{li,Range@Length@li};
  firstPart=currentList[[1,1]];
  restcurrentList=Rest@currentList;
  While[currentList=!={},
   firstPart=currentList[[1,1]];
   restcurrentList=Rest@currentList;
   indexes=
        Flatten@Position[(sameTestQ[firstPart,#]&)/@(
                Rest@First@Transpose@currentList),True,{1}];
   resultA={resultA,
         
tempHead[currentList[[1,-1]],Last/@restcurrentList[[indexes]]]};
   currentList=
        Part[restcurrentList,
          Complement[Range@Length@restcurrentList,indexes]]
  ];
  {resultA/.List->Sequence}/.tempHead->List
 ];

Some general notes:
* In general, === is faster than ==. Use the later only for inputs
involving Real numbers.

* AppendTo is super slow. In general, nest the stuff and give a final
Flatten. Alternatively, predefine a long list such as
myList={1,1,1,1,...} then assign each one like myList[n]=someThing.

* Range takes time. This is a potential speed hog in the code.

* In general, Part is relatively slow. This is why Clemens' original
solution of using Position in the code beats the hell out of the
"standard" version that call Part frequently. Using Part 100 times is
slower than Range[100].

* Complement is also a potential speed hog. I was able to avoid both
Range and Complement in my final version.

* First/@list is slower than First@Transpose@list. The two are always
equivalent unless list is empty, for which the later prints a warning.
The speed difference adds up if list is long. I believe Transpose is
constant time, while Map is linear order.

* loop with assgnments such as While[test,a=...; b=...;...] is in
GENERAL slower than looping a function such as
FixedPoint[{a,b,...}&,{a,b,...},SameTest->test].

 Xah, xah@best.com
 http://www.best.com/~xah/PageTwo_dir/more.html
 Mountain View, CA, USA
 "morality abets evil"
Prev by Date: Using Nodal
Next by Date: Re: prograMing: EquivalenceIndex
Prev by thread: Using Nodal
Next by thread: Re: prograMing: EquivalenceIndex