MathGroup Archive: September 2005 [00602]

[Date Index] [Thread Index] [Author Index]
Re:Re: Set of strings reducing problem (Again!)
To: mathgroup at smc.vnet.net
Subject: [mg60658] Re:[mg60599] Re: Set of strings reducing problem (Again!)
From: "Edson Ferreira" <edsferr at uol.com.br>
Date: Fri, 23 Sep 2005 04:20:06 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com
dkr,

I have tested your approach to the set of strings reduction problem.
All I can say is that I astonished. Your code is up to 27 times faster than the one I've got.
Awesome!

If I put the option Shortmatch->True then your code is up to 14 times faster.

For example, I got a list with 864 strings. I applied some filter on it and got a new one with 237 strings. Then I applied both my code and your code to this filtered set os strings.

Although results are different with I put the shortmatch option,I mean the length of the reduced sets, both are correct. My code took 112 seconds while yours took 8 seconds to perform the reduction. Without the option, your code took 4 seconds!

By the way, What do you mean by shortest pattern match? I didn't get it! 
It took more steps in your examples to perform the same reducing operation than with shortmatch->False. Could your clarify this concept? 

Thank you very much indeed! And thanks for everybody who sent me codes to perform this operation! Credits to the original code author as well.

I'm going further with those tests yet...

Edson Ferreira

> Edson, 
> 
> Here is an alternate pattern matching approach. 
> Instead of decomposing strings via Characters and using the standard \ 
> pattern matching techniques as you have done, it takes the opposite 
> tact, \ 
> combining your original set of strings into a single string and then 
> applying \ 
> the string matching techniques introduced in Version 5 of Mathematica 
> 
> In[1]:= 
> origList={"111","11X","112","1X1","1XX","1X2","121","12X","122"}; 
> 
> In[2]:= 
> (* Here we form one long string from the strings in origList, 
> inserting list braces to delimit the individual strings *) 
> str= ToString[{#}]&/@origList//StringJoin 
> Out[2]= 
> {111}{11X}{112}{1X1}{1XX}{1X2}{121}{12X}{122} 
> 
> In[3]:= %2//InputForm 
> Out[3]//InputForm= 
> "{111}{11X}{112}{1X1}{1XX}{1X2}{121}{12X}{122}" 
> 
> In[4]:= 
> rr={{"1","X"},{"1","2"},{"1","U"},{"2","X"},{"M","X"},{"2","D"}}; 
> rules=Thread[Rule[rr,{"D","M","T","U","T","T"}]] 
> Out[5]= 
> {{1,X}->D,{1,2}->M,{1,U}->T,{2,X}->U,{M,X}->T,{2,D}->T} 
> 
> In[6]:= 
> Attributes[ss]=Orderless; 
> ss[i_String,j_String]/;MemberQ[rr,{i,j}]:=ss[i,j]={i,j}; 
> (* ss[string1,string2] evaluates to a list only if there is a rule 
> available for the \ 
> pair of strings *) 
> 
> In[8]:= 
> fun[i_String]:= 
> StringReplace[i, 
> StringExpression["{",a:(WordCharacter)...,p:(WordCharacter), 
> 
> c:(WordCharacter)...,"}",z___,"{",a___,q:(WordCharacter),c___, 
> 
> "}"]/;MatchQ[ss[p,q],_List]:>"{"<>a<>(ss[p,q]/.rules)<>c<>"}"<>z]; 
> 
> In[9]:=fun[str]//InputForm 
> Out[9]//InputForm= 
> "{11M}{11X}{1XM}{1XX}{12M}{12X}" 
> 
> (* Note that multiple replacements can occur during one iteration of \ 
> StringReplace. Here 3 pairs of appropriately mismatched strings were 
> found. 
> In your approach, at most one pair of appropriately mismatched strings 
> is found during each \ 
> iteration of ReplaceRepeated. *) 
> 
> In[10]:= 
> (* We iterate using FixedPoint *) 
> FixedPoint[fun,str]//InputForm 
> Out[10]//InputForm= 
> "{1TT}" 
> 
> In[11]:= 
> (* Finally we convert the result above, which is just a string, 
> back to a list of individual strings *) 
> 
> StringCases[%,"{"~~a:(WordCharacter)..~~"}":>a]//FullForm 
> Out[11]//FullForm= 
> List["1TT"] 
> 
> In[12]:= 
> (* FixedPointList exposes how much reduction is going on during each \ 
> iteration of StringReplace *) 
> FixedPointList[fun,str]//FullForm 
> Out[12]//FullForm= 
> List["{111}{11X}{112}{1X1}{1XX}{1X2}{121}{12X}{122}","{11M}{11X}{1XM}{1XX}{\ 
> 12M}{12X}","{11T}{1XT}{12T}","{1MT}{1XT}","{1TT}","{1TT}"] 
> 
> In[13]:= 
> (* We might do better forcing the rule in StringReplace to find the 
> shortest \ 
> pattern match, using ShortestMatch *) 
> 
> altfun[i_String]:= 
> StringReplace[i, 
> ShortestMatch[ 
> StringExpression["{",a:(WordCharacter)...,p:(WordCharacter), 
> 
> c:(WordCharacter)...,"}",z___,"{",a___,q:(WordCharacter),c___, 
> "}"]/;MatchQ[ss[p,q],_List]]:> 
> "{"<>a<>(ss[p,q]/.rules)<>c<>"}"<>z]; 
> 
> 
> In[14]:= 
> FixedPointList[altfun,str]//FullForm 
> Out[14]//FullForm= 
> List["{111}{11X}{112}{1X1}{1XX}{1X2}{121}{12X}{122}","{1D1}{11X}{112}{1=
UX}{\ 
> 1X2}{121}{122}","{1T1}{11X}{112}{1UX}{1X2}{122}","{1T1}{1TX}{112}{1U2}"=
,"{1TD}\ 
> {1T2}","{1TT}","{1TT}"] 
> 
> (* Note we get a different sequence of reductions *) 
> 
> In[15]:= 
> (* Summarizing all of the above *) 
> CleanSlate[]; 
> 
> In[1]:= 
> rr={{"1","X"},{"1","2"},{"1","U"},{"2","X"},{"M","X"},{"2","D"}}; 
> rules=Thread[Rule[rr,{"D","M","T","U","T","T"}]]; 
> Options[reductionFn]={shortmatch->False}; 
> reductionFn[origList:{__String},opts___Rule]:=Module[{zmatch,ss,str,f=
p}, 
> Attributes[ss]=Orderless; 
> ss[i_String,j_String]/;MemberQ[rr,{i,j}]:=ss[i,j]={i,j}; 
> str=ToString[{#}]&/@origList//StringJoin; 
> zmatch=shortmatch/.{opts}/.Options[reductionFn]; 
> fp:=FixedPoint[ 
> StringReplace[#, 
> 
> StringExpression["{",a:(WordCharacter)...,p:(WordCharacter), 
> c:(WordCharacter)...,"}",z___,"{",a___, 
> q:(WordCharacter),c___,"}"]/; 
> MatchQ[ss[p,q],_List]:> 
> 
> "{"<>a<>(ss[p,q]/.rules)<>c<>"}"<>z]&,str]/;zmatch===False; 
> fp:=FixedPoint[ 
> StringReplace[#, 
> ShortestMatch[ 
> 
> StringExpression["{",a:(WordCharacter)...,p:(WordCharacter), 
> c:(WordCharacter)...,"}",z___,"{",a___, 
> q:(WordCharacter),c___,"}"]/; 
> MatchQ[ss[p,q],_List]]:> 
> "{"<>a<>(ss[p,q]/.rules)<>c<>"}"<>z]&,str]; 
> StringCases[fp,"{"~~a:(WordCharacter)..~~"}":>a] 
> ]; 
> 
> In[7]:= 
> sampleList1={"111","11X","112","1X1","1XX","1X2","121","12X","122"}; 
> reductionFn[sampleList1]//InputForm 
> Out[8]//InputForm= 
> {"1TT"} 
> 
> In[9]:= 
> reductionFn[sampleList1,shortmatch->True]//InputForm 
> Out[9]//InputForm= 
> {"1TT"} 
> 
> In[10]:= 
> sampleList2={"1111","111X","1121","112X","11X1","11XX","1211","121X","1221", 
> 
> "12X1","1X11","1X1X","1X21","1XX1","2111","211X","2121","21X1","2211",
> "2X11","X111","X11X","X121","X1X1","X211","XX11"}; 
> reductionFn[sampleList2]//InputForm 
> Out[11]//InputForm= 
> {"11TD", "1U1D", "1UU1", "U11D", "U1U1", "UU11"} 
> 
> (* The above is the solution your pattern matching approach yields *) 
> 
> In[12]:= 
> reductionFn[sampleList2,shortmatch->True]//InputForm 
> Out[12]//InputForm= 
> {"T11D", "1TU1", "11UX", "TU11", "1U1X", "U1U1"} 
> 
> (* Using the shortest match instead, we obtain the solution in my 
> previous message *) 
> 
> I have no idea how efficient my pattern matching approach is relative 
> to your pattern matching approach. I would venture to guess that both 
> pattern matching approaches will be very inefficient relative to a 
> functional approach based on some of the ideas in the previous thread. 
> The problem with a pattern matching approach for your problem is that 
> the pattern is just too cumbersome. 
> 
>
Prev by Date: Re: Re:Re: Set of strings reducing problem (Again!)
Next by Date: Somebody using Mathematica on SGI Altix?
Previous by thread: Re: Re:Re: Set of strings reducing problem (Again!)
Next by thread: Mathematica: access serial port?