Re: a (serious) question about character codes in Mathematica

*To*: mathgroup at smc.vnet.net*Subject*: [mg71485] Re: a (serious) question about character codes in Mathematica*From*: "Philipp" <Philipp.M.O at gmail.com>*Date*: Mon, 20 Nov 2006 06:34:27 -0500 (EST)*References*: <ejoskp$n3b$1@smc.vnet.net>

Chris, I suppose you can redefine, if you've got the patience, all Mathematica encoding files in ...\Wolfram Research\Mathematica\5.2\SystemFiles\CharacterEncodings. (or wherever they sit on your system) Regarding Mathematica7.m, create, say, Mathematica7unicode.m, with (* Math7unicode *) {"16Bit", { {8477, "\[DoubleStruckCapitalR]"} } } (Be careful with prettifying the list; I found syntax checking on CharacterEncoding files a bit twitchy) Now, In[1]:=FromCharacterCode[8477, "Mathematica7unicode"] // FullForm Out[1]//FullForm="\[DoubleStruckCapitalR]" and, In[2]:=ToCharacterCode["\[DoubleStruckCapitalR]","Mathematica7unicode"] Out[2]={8477} I hope this helps. Cheers, Philipp Chris Chiasson wrote: > Usually, my questions about character codes are more complaints than > anything else. > > But, this time I just want info. I promise :-) > > > > FromCharacterCode[16^^52,"Mathematica7"]//FullForm > > gives > > "\[DoubleStruckCapitalR]" > > (52 in hex is 82 in decimal, so this is the 83rd character in the > Mathematica7 font - numbering starts from zero) > > As shown below, Mathematica seems to map characters to non-standard > Unicode points. > > In Unicode, the double struck capitol R is 0x211d in hex and 8477 in decimal[1]. > > But, Mathematica assigns this character to (the second argument is > optional here) > > ToCharacterCode["\[DoubleStruckCapitalR]","Unicode"] > > {63413} > > 63413 in decimal and 0xf7b5 in hex does not map to any (regular) > glyph. In fact, the number is in the private use area of Unicode. > > I know Mathematica has been around for a long time. Perhaps this code > point was in use for DoubleStruckCapitalR before that glyph had a code > point in regular Unicode. I am sure there is a perfectly good > explanation. > > I am interested in ways to acquire the common Unicode points for the > glyphs in Mathematica7 along with the stretchy characters for > parenthesis and lists from other Mathematica fonts. This can be done > by feeding the raw character into one of the lower level conversion > routines like > > System`Convert`XMLDump`determineEntityExportFunction[{"\[DoubleStruckCapitalR]"},"US-ASCII"]["\[DoubleStruckCapitalR]"] > > (I think this call is right, but I can't be sure because it's > undocumented - also, it must be enabled by exporting something first) > > but it requires a lot of post processing to apply it to all the > characters in a particular font (because sometimes the result is a > plain character instead of an HTML entity string - and sometimes the > result is a semicolon...). > > What is the best way to get the "real" Unicode points for the exotic > fonts in Mathematica? > > [1] http://www.unicode.org/charts/PDF/U2100.pdf > > -- > http://chris.chiasson.name/