MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

StringMatchQ and non-ASCII characters

  • To: mathgroup at smc.vnet.net
  • Subject: [mg105917] StringMatchQ and non-ASCII characters
  • From: "Norbert P." <bertapozar at gmail.com>
  • Date: Sat, 26 Dec 2009 19:05:47 -0500 (EST)

Hi,

I'm playing around with a Japanese dictionary in Mathematica 6.0.2 and
I stumbled upon a strange behavior of StringMatchQ when working with
non-ascii characters, such as Japanese kanji.

Consider the following one-character string:

In[1]:= s="\:672c";

In[2]:= StringLength[s]
Out[2]= 1

In[3]:= StringMatchQ[s,_?((Print[InputForm[#],ToCharacterCode[#]];True)
&)]
During evaluation of In[3]:= "\:672c"{26412}
During evaluation of In[3]:= "\234"{156}
During evaluation of In[3]:= "\[Not]"{172}
Out[3]= True

It seems that the pattern test is applied 3 times, even though _
should match only one character. I want to use a different test
function, for example testing if the character is a kanji. The test
function given is only to illustrate the problem I'm having since it
seems that the pattern test must yield True in all 3 cases for
StringMatchQ to return True, as in

In[4]:= StringMatchQ["=E6=9C=AC",_?KanjiQ]
Out[4]= False

since

In[5]:= StringMatchQ[s,_?((Print[KanjiQ[#],ToCharacterCode[#]];KanjiQ
[#])&)]
During evaluation of In[5]:= True{26412}
During evaluation of In[5]:= False{156}
Out[5]= False

Am I doing something wrong? I couldn't find anything in the
documentation. It would help me a lot if I could use the build-in
string pattern functionality for Japanese =)

Best,
Norbert


  • Prev by Date: ActionMenu with directories
  • Next by Date: Re: complex numbers
  • Previous by thread: Re: ActionMenu with directories
  • Next by thread: FindInstance Problem