StringMatchQ and non-ASCII characters
- To: mathgroup at smc.vnet.net
- Subject: [mg105917] StringMatchQ and non-ASCII characters
- From: "Norbert P." <bertapozar at gmail.com>
- Date: Sat, 26 Dec 2009 19:05:47 -0500 (EST)
Hi, I'm playing around with a Japanese dictionary in Mathematica 6.0.2 and I stumbled upon a strange behavior of StringMatchQ when working with non-ascii characters, such as Japanese kanji. Consider the following one-character string: In[1]:= s="\:672c"; In[2]:= StringLength[s] Out[2]= 1 In[3]:= StringMatchQ[s,_?((Print[InputForm[#],ToCharacterCode[#]];True) &)] During evaluation of In[3]:= "\:672c"{26412} During evaluation of In[3]:= "\234"{156} During evaluation of In[3]:= "\[Not]"{172} Out[3]= True It seems that the pattern test is applied 3 times, even though _ should match only one character. I want to use a different test function, for example testing if the character is a kanji. The test function given is only to illustrate the problem I'm having since it seems that the pattern test must yield True in all 3 cases for StringMatchQ to return True, as in In[4]:= StringMatchQ["=E6=9C=AC",_?KanjiQ] Out[4]= False since In[5]:= StringMatchQ[s,_?((Print[KanjiQ[#],ToCharacterCode[#]];KanjiQ [#])&)] During evaluation of In[5]:= True{26412} During evaluation of In[5]:= False{156} Out[5]= False Am I doing something wrong? I couldn't find anything in the documentation. It would help me a lot if I could use the build-in string pattern functionality for Japanese =) Best, Norbert