MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: a conflicting StringReplace

  • To: mathgroup at smc.vnet.net
  • Subject: [mg56422] Re: a conflicting StringReplace
  • From: Maxim <ab_def at prontomail.com>
  • Date: Sun, 24 Apr 2005 03:29:28 -0400 (EDT)
  • References: <d4ako1$irn$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

On Fri, 22 Apr 2005 10:47:29 +0000 (UTC), Wolf, Hartmut  
<Hartmut.Wolf at t-systems.com> wrote:

>
>> -----Original Message-----
>> From: Hui Fang [mailto:fangh73 at xmu.edu.cn]
To: mathgroup at smc.vnet.net
>> Sent: Thursday, April 21, 2005 11:37 AM
>> Subject: [mg56422]  a conflicting StringReplace
>>
>> I was teaching Mathematica in a college. In the class I was
>> showing them
>> some built-in functions about strings. Since this is not a very
>> important issue, I didn't spend much time on each function.
>> When I show
>> them StringReplace, I gave them the following examples:
>> In[1]    StringReplace["abc",{"ab"->"AB"}]
>> Out[1]   ABc
>>
>> In[2]   StringReplace["abc", {"bc"->"BC"}]
>> Out[2]   aBC
>>
>> No problem on those. Now a student tried the following:
>> In[3]   StringReplace["abc", {"ab"->"AB", "bc"->"BC"}]
>> Out[3]   ABc
>>
>> Now he asked me why only "ab" is replaced. I said this is
>> because there
>> is a conflict because both "ab" and "bc" contains "b". So Mathematica
>> will choose the first replacement. I also told him if he changes the
>> order, he will get aBC. Now:
>> In[4]    StringReplace["abc", {"bc"->"BC","ab"->"AB"}]
>> Out[4]    ABc
>>
>> This is the part I don't understand. Does Mathematica treat
>> those rules
>> in their canonical order (since "ab" is before "bc" in canonical
>> order.), or in their written order?
>>
>> Thanks a lot!
>>
>> Hui Fang
>>
>>
>
> The behavior is explained in Help:
>
> StringReplace goes through a string, testing substrings that start at
> each successive character position. On each substring, it tries in turn
> each of the transformation rules you have specified. If any of the rules
> apply, it replaces the substring, then continues to go through the
> string, starting at the character position after the end of the
> substring.
>
>
> What is not quite clear (to me) from that explanation is the meaning of
> "each substring": does this just mean the rest of the string starting at
> current position, or is each substring of different length (starting
> there) considered as different (such that the next rule is tried first,
> before we further run down the string? An experiment shows, that the
> first assumtion applies (and this gives the algorithm that performs
> better)
>
>
> StringReplace goes through a string, testing substrings that start at
> each successive character position.
>
> -- starting at position of "a" in string "abc"
>
>
> On each substring,
>
> -- at starting position it's: "abc"
>
> it [StringReplace] tries in turn each of the transformation rules you
> have specified.
>
> -- so first it tries "b" of "bc" on "a" of "abc" --> fail, try next rule
> -- then tries "a" of "ab" on "a" of "abc" --> interesting, go on
> -- tries "b" of "ab" on "b" of "abc" --> interesting, go on
> -- pattern is exhausted, such we have --> success of pattern on
> substring "ab" of "abc"
>
> If any of the rules apply, it replaces the substring,
>
> -- so here "ab" of "abc" becomes "AB", i.e the string becomes
>
> "ABc"
>
> then continues to go through the string, starting at the character
> position after the end of the substring.
>
> -- Substring considered next is "c" of "abc" (or "ABc", replaced part is
> not considered again)
>
> -- so now we compare "b" of "bc" on "c" --> fail, try next rule
> -- try "a" of "ab" on "c" --> fail, no more rule, advance position
>
> -- but string is exhausted
>
> -- so the result is "ABc"
>
>
>
>
> Here are two more examples:
>
>
> In[29]:= StringReplace["abc", {"ab" -> "ab", "bc" -> "bc",
>             "a" -> "1", "b" -> "2", "c" -> "3"}]
> Out[29]= "ab3"
>
> here rule "a" -> "1" is masked by pattern "ab" which matches first,
> "bc" and "b" cannot match, as neither substring "bc" nor "b" are part of
> substring "c" left over from "abc" after substitution "ab" -> "ab".
>
>
>
> In[31]:= StringReplace["abc", {"a" -> "1", "ab" -> "ab",
>             "bc" -> "bc", "b" -> "2", "c" -> "3"}]
> Out[31]= "1bc"
>
> here rule "a" -> "1" matches first, and substitution takes place.
> For rest of string "bc", from "abc", patterns "a" and "ab" will never
> match, "bc" -> "bc" matches first,
> and such masks  rule "c" -> "3"
>
>
>
>
> --
> Hartmut Wolf
>

I'll try to explain the last two examples (In[29]/In[31]) in a slightly  
different way. Consider:

In[1]:=
StringReplace["abc",
   {"bc"  /; (Print["bc"]; True)  -> "BC",
    "ab"  /; (Print["ab"]; True)  -> "AB",
    "abc" /; (Print["abc"]; True) -> "ABC"}]

 From In[1]:=
bc

 From In[1]:=
ab

Out[1]=
"ABc"

In[2]:=
StringReplace["abc",
   {"bc"  /; (Print["bc"]; True)  -> "BC",
    "abc" /; (Print["abc"]; True) -> "ABC",
    "ab"  /; (Print["ab"]; True)  -> "AB"}]

 From In[2]:=
bc

 From In[2]:=
abc

Out[2]=
"ABC"

The explanation might be as follows: StringReplace tries to find a  
substring which matches the pattern in the first rule. When it finds such  
a substring ("bc"), it doesn't replace it immediately; instead, it  
proceeds to find a possible match for the second rule, and so on. Then it  
obtains several candidates for replacement, and the question is which one  
gets selected. StringReplace appears to pick the first rule for *any*  
leftmost substring which can be replaced. This explains why "bc" never  
gets replaced but the outcome depends on the order of the other two rules,  
which are both 'leftmost'.

So the omission in the documentation seems to be that StringReplace  
doesn't always search through substrings in some fixed order, e.g. from  
the longest to the shortest (then the replacement rule for "abc" would  
always be the first one used), but the search order depends on the order  
of the rules.

Additionally, there is an inconsistency in the case of BlankNullSequence:

In[3]:=
StringReplace["abc", x___ /; x === "a" -> "1"]
StringReplace["abc", x___ /; x === "b" -> "1"]

Out[3]=
"1bc"

Out[4]=
"abc"

It catches only substrings starting at the first position.

Maxim Rytin
m.r at inbox.ru


  • Prev by Date: Re: simplifying ulam spiral code
  • Next by Date: Re: Re: Re: Re: multiple 3d plots
  • Previous by thread: Re: a conflicting StringReplace
  • Next by thread: PRINTER PROBLEMS