Re: a conflicting StringReplace
- To: mathgroup at smc.vnet.net
- Subject: [mg56332] Re: [mg56306] a conflicting StringReplace
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 22 Apr 2005 06:23:13 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message----- >From: Hui Fang [mailto:fangh73 at xmu.edu.cn] To: mathgroup at smc.vnet.net >Sent: Thursday, April 21, 2005 11:37 AM >Subject: [mg56332] [mg56306] a conflicting StringReplace > >I was teaching Mathematica in a college. In the class I was >showing them >some built-in functions about strings. Since this is not a very >important issue, I didn't spend much time on each function. >When I show >them StringReplace, I gave them the following examples: >In[1] StringReplace["abc",{"ab"->"AB"}] >Out[1] ABc > >In[2] StringReplace["abc", {"bc"->"BC"}] >Out[2] aBC > >No problem on those. Now a student tried the following: >In[3] StringReplace["abc", {"ab"->"AB", "bc"->"BC"}] >Out[3] ABc > >Now he asked me why only "ab" is replaced. I said this is >because there >is a conflict because both "ab" and "bc" contains "b". So Mathematica >will choose the first replacement. I also told him if he changes the >order, he will get aBC. Now: >In[4] StringReplace["abc", {"bc"->"BC","ab"->"AB"}] >Out[4] ABc > >This is the part I don't understand. Does Mathematica treat >those rules >in their canonical order (since "ab" is before "bc" in canonical >order.), or in their written order? > >Thanks a lot! > >Hui Fang > > The behavior is explained in Help: StringReplace goes through a string, testing substrings that start at each successive character position. On each substring, it tries in turn each of the transformation rules you have specified. If any of the rules apply, it replaces the substring, then continues to go through the string, starting at the character position after the end of the substring. What is not quite clear (to me) from that explanation is the meaning of "each substring": does this just mean the rest of the string starting at current position, or is each substring of different length (starting there) considered as different (such that the next rule is tried first, before we further run down the string? An experiment shows, that the first assumtion applies (and this gives the algorithm that performs better) StringReplace goes through a string, testing substrings that start at each successive character position. -- starting at position of "a" in string "abc" On each substring, -- at starting position it's: "abc" it [StringReplace] tries in turn each of the transformation rules you have specified. -- so first it tries "b" of "bc" on "a" of "abc" --> fail, try next rule -- then tries "a" of "ab" on "a" of "abc" --> interesting, go on -- tries "b" of "ab" on "b" of "abc" --> interesting, go on -- pattern is exhausted, such we have --> success of pattern on substring "ab" of "abc" If any of the rules apply, it replaces the substring, -- so here "ab" of "abc" becomes "AB", i.e the string becomes "ABc" then continues to go through the string, starting at the character position after the end of the substring. -- Substring considered next is "c" of "abc" (or "ABc", replaced part is not considered again) -- so now we compare "b" of "bc" on "c" --> fail, try next rule -- try "a" of "ab" on "c" --> fail, no more rule, advance position -- but string is exhausted -- so the result is "ABc" Here are two more examples: In[29]:= StringReplace["abc", {"ab" -> "ab", "bc" -> "bc", "a" -> "1", "b" -> "2", "c" -> "3"}] Out[29]= "ab3" here rule "a" -> "1" is masked by pattern "ab" which matches first, "bc" and "b" cannot match, as neither substring "bc" nor "b" are part of substring "c" left over from "abc" after substitution "ab" -> "ab". In[31]:= StringReplace["abc", {"a" -> "1", "ab" -> "ab", "bc" -> "bc", "b" -> "2", "c" -> "3"}] Out[31]= "1bc" here rule "a" -> "1" matches first, and substitution takes place. For rest of string "bc", from "abc", patterns "a" and "ab" will never match, "bc" -> "bc" matches first, and such masks rule "c" -> "3" -- Hartmut Wolf