Re: prograMing: EquivalenceIndex
- To: mathgroup@smc.vnet.net
- Subject: [mg11213] Re: prograMing: EquivalenceIndex
- From: "Xah" <xah@best.com>
- Date: Mon, 2 Mar 1998 23:11:04 -0500
- Organization: Venus & Xah Love Factory
I've spend about 6 hours studying the EquivalenceIndex problem. I have found a solution I deem perfect. It is clear, purely functional, and I doubt any other code can be faster. I'll come to it below. In the process, I discovered an important design issue which I missed before. Namely, that user can pass a sameTestQ such that sameTestQ[1,1]===False. For example, if sameTestQ is (#1===Reverse@#2&), then sameTestQ[{1,2},{1,2}] returns False. ALL previous implementations of EquivalenceIndex are susceptable to incorrect results when such sameTestQ is given. For example, all versions give wrong results in EquivalenceIndex[{{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}, {1,2,3}},#1===Reverse@#2&] For sometime I wasn't sure whether EquivalenceIndex should assume identical elements being equal regardless what sameTestQ says. In the end I decided that user's sameTestQ should take complete control. This philosophy is in agreement with the behavior of Union. For example, Union[{{2,1,3,4},{2,1,3,4}},SameTest->(#1===Reverse@#2&)] returns the argument unchanged. (By now this seems an obvious decision) Before I give my new solutions of EquivalenceIndex, let me give a bit of background. (which I posted a few months ago here) The problem starts with Union's inability to truely return distinct elements with user defined SameTest. For example, the following: Union[{{2,1,3,4},{3,4,1,2},{4,3,1,2}},SameTest->(#1===#2||#1===Reverse@#2&)] should return {{2,1,3,4},{3,4,1,2}} but it dosn't. The cause is that Union only uses SameTest to compare neighboring elements under standard Sort. For a workaround, I wrote DistinctElements, in which EquivalenceIndex is a generalization: DistinctElements[li_,sameTestQ_:SameQ]:= Part[li,First/@EquivalenceIndex[li,sameTestQ]]; I felt there is a general need for DistinctElements and EquivalenceIndex. Daniel Lichtblau back then hinted that future version of Mathematica may "fix" Union. Now, here is the killer solution of EquivalenceIndex. Thanks to Clemens Frey (<Clemens.Frey@uni-bayreuth.de>) whoes idea of using Position I've borrowed. Clear[EquivalenceIndex]; EquivalenceIndex::usage= "EquivalenceIndex[listA,sameTestQ] returns a partition of indexes, e.g. {{1,4,7},{2},{3,10},{5,8},{6,9}}, of listA. Index of elements that are considered equivalent under sameTestQ are grouped together. sameTestQ must accept two arguments and return either True or False. ListA can have head other than List. To guarantee that sameTestQ[listA[[e]],listA[[f]]]===False for any e,f in the result index grouping, sameTestQ must be reflective, orderless, and transitive: (1) sameTestQ[a,a]===True, (2) sameTestQ[a,b]===sameTestQ[b,a], (3) if sameTestQ[a,b], sameTestQ[b,c], then sameTestQ[a,c]. Exapmle: EquivalenceIndex[Permutations[Range@3],#1===#2||#1===Reverse@#2&]"; (*purely functional implementation, and fastest. The killer solution!*) EquivalenceIndex[li_,sameTestQ_:SameQ]:=First@FixedPoint[ With[{resultA=First@#,firstIndexes=#[[-1,1]],restIndexes=Rest@Last@#, firstPart=li[[#[[-1,1]]]]}, ({Append[resultA,Join[{firstIndexes},Part[restIndexes,Flatten@#]]], Delete[restIndexes,#]}&)@ Position[(sameTestQ[firstPart,#]&)/@Part[li,restIndexes], True,{1}] ]&, {{},Range@Length@li}, SameTest->(Last@#2==={}&) ]; (*alternative, standard imperative implementation. Easy to read. Improved from previous implementations, and fixed sameTestQ bug. *) Clear[EquivalenceIndex2]; EquivalenceIndex2[li_,sameTestQ_:SameQ]:= Module[{indexes,resultA,tempHead,firstPart}, indexes=Range@Length@li; resultA={}; While[Length@indexes=!=0, firstPart=Part[li,First@indexes]; resultA={resultA, tempHead[First@indexes, Cases[Rest@indexes,_?(sameTestQ[firstPart,Part[li,#]]&),{1}]]}; indexes=Complement[Rest@indexes,(resultA[[-1,-1]])]]; DeleteCases[{resultA/.List->Sequence}/.tempHead->List,{},{1}]]; -- For the industrious, here are some testing code and results. In[6]:= Clear[li,sameTestQ,functions]; functions={EquivalenceIndex,EquivalenceIndex2}; li=(Join[Permutations@#,{#}]&)@Range@6; sameTestQ=(#1===Reverse@#2&); ({SameQ@@(Last/@#),First/@#}&)@((Timing@#[li,sameTestQ]&)/@functions) Out[6]= {True,{26.1 Second,34.7 Second}} In[7]:= li=Table[Random[Integer,{1,500}],{500}]; sameTestQ=SameQ; ({SameQ@@(Last/@#),First/@#}&)@((Timing@#[li,sameTestQ]&)/@functions) Out[7]= {True,{6.63333 Second,11.7 Second}} -- For those truely adventurous, here's a syntax variation of the purely functional version. It is faster by a few fractions of a second for lengthy inputs (e.g. li=Table[Random[Integer,{1,500}],{500}], sameTestQ=SameQ). Instead of using a With construct, it avoided With by nesting multiple-argument-pure-functions. It is faster because With takes time. Clear[EquivalenceIndex2]; EquivalenceIndex2[li_,sameTestQ_:SameQ]:=First@FixedPoint[ Function[({Append[#1,Join[{#2},Part[#3,Flatten@#4]]],Delete[#3,#4]}&)[Sequen ce@@(Function[{x,y,z},{First@x,x[[-1,1]],z,Position[(sameTestQ[y,#]&)/@Part[ li,z],True,{1}]}][#,li[[#[[-1,1]]]],Rest@Last@#])]], {{},Range@Length@li}, SameTest->(Last@#2==={}&) ]; If anyone can come up with still a faster solution, **I'll buy her a drink with awe**. PS Here's another version, which is essentially a fixed and improved version of Clemens Frey's EquivalenceIndexCF2. Clear[EquivalenceIndex2]; EquivalenceIndex2[li_,sameTestQ_:SameQ]:= Module[{resultA,currentList,restcurrentList,firstPart,indexes,tempHead}, resultA={}; currentList=Transpose@{li,Range@Length@li}; firstPart=currentList[[1,1]]; restcurrentList=Rest@currentList; While[currentList=!={}, firstPart=currentList[[1,1]]; restcurrentList=Rest@currentList; indexes= Flatten@Position[(sameTestQ[firstPart,#]&)/@( Rest@First@Transpose@currentList),True,{1}]; resultA={resultA, tempHead[currentList[[1,-1]],Last/@restcurrentList[[indexes]]]}; currentList= Part[restcurrentList, Complement[Range@Length@restcurrentList,indexes]] ]; {resultA/.List->Sequence}/.tempHead->List ]; Some general notes: * In general, === is faster than ==. Use the later only for inputs involving Real numbers. * AppendTo is super slow. In general, nest the stuff and give a final Flatten. Alternatively, predefine a long list such as myList={1,1,1,1,...} then assign each one like myList[n]=someThing. * Range takes time. This is a potential speed hog in the code. * In general, Part is relatively slow. This is why Clemens' original solution of using Position in the code beats the hell out of the "standard" version that call Part frequently. Using Part 100 times is slower than Range[100]. * Complement is also a potential speed hog. I was able to avoid both Range and Complement in my final version. * First/@list is slower than First@Transpose@list. The two are always equivalent unless list is empty, for which the later prints a warning. The speed difference adds up if list is long. I believe Transpose is constant time, while Map is linear order. * loop with assgnments such as While[test,a=...; b=...;...] is in GENERAL slower than looping a function such as FixedPoint[{a,b,...}&,{a,b,...},SameTest->test]. Xah, xah@best.com http://www.best.com/~xah/PageTwo_dir/more.html Mountain View, CA, USA "morality abets evil"