StringMatchQ and non-ASCII characters
- To: mathgroup at smc.vnet.net
- Subject: [mg105917] StringMatchQ and non-ASCII characters
- From: "Norbert P." <bertapozar at gmail.com>
- Date: Sat, 26 Dec 2009 19:05:47 -0500 (EST)
Hi,
I'm playing around with a Japanese dictionary in Mathematica 6.0.2 and
I stumbled upon a strange behavior of StringMatchQ when working with
non-ascii characters, such as Japanese kanji.
Consider the following one-character string:
In[1]:= s="\:672c";
In[2]:= StringLength[s]
Out[2]= 1
In[3]:= StringMatchQ[s,_?((Print[InputForm[#],ToCharacterCode[#]];True)
&)]
During evaluation of In[3]:= "\:672c"{26412}
During evaluation of In[3]:= "\234"{156}
During evaluation of In[3]:= "\[Not]"{172}
Out[3]= True
It seems that the pattern test is applied 3 times, even though _
should match only one character. I want to use a different test
function, for example testing if the character is a kanji. The test
function given is only to illustrate the problem I'm having since it
seems that the pattern test must yield True in all 3 cases for
StringMatchQ to return True, as in
In[4]:= StringMatchQ["=E6=9C=AC",_?KanjiQ]
Out[4]= False
since
In[5]:= StringMatchQ[s,_?((Print[KanjiQ[#],ToCharacterCode[#]];KanjiQ
[#])&)]
During evaluation of In[5]:= True{26412}
During evaluation of In[5]:= False{156}
Out[5]= False
Am I doing something wrong? I couldn't find anything in the
documentation. It would help me a lot if I could use the build-in
string pattern functionality for Japanese =)
Best,
Norbert