MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

a (serious) question about character codes in Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg71447] a (serious) question about character codes in Mathematica
  • From: "Chris Chiasson" <chris at chiasson.name>
  • Date: Sun, 19 Nov 2006 01:10:03 -0500 (EST)

Usually, my questions about character codes are more complaints than
anything else.

But, this time I just want info. I promise :-)



FromCharacterCode[16^^52,"Mathematica7"]//FullForm

gives

"\[DoubleStruckCapitalR]"

(52 in hex is 82 in decimal, so this is the 83rd character in the
Mathematica7 font - numbering starts from zero)

As shown below, Mathematica seems to map characters to non-standard
Unicode points.

In Unicode, the double struck capitol R is 0x211d in hex and 8477 in decimal[1].

But, Mathematica assigns this character to (the second argument is
optional here)

ToCharacterCode["\[DoubleStruckCapitalR]","Unicode"]

{63413}

63413 in decimal and 0xf7b5 in hex does not map to any (regular)
glyph. In fact, the number is in the private use area of Unicode.

I know Mathematica has been around for a long time. Perhaps this code
point was in use for DoubleStruckCapitalR before that glyph had a code
point in regular Unicode. I am sure there is a perfectly good
explanation.

I am interested in ways to acquire the common Unicode points for the
glyphs in Mathematica7 along with the stretchy characters for
parenthesis and lists from other Mathematica fonts. This can be done
by feeding the raw character into one of the lower level conversion
routines like

System`Convert`XMLDump`determineEntityExportFunction[{"\[DoubleStruckCapitalR]"},"US-ASCII"]["\[DoubleStruckCapitalR]"]

(I think this call is right, but I can't be sure because it's
undocumented - also, it must be enabled by exporting something first)

but it requires a lot of post processing to apply it to all the
characters in a particular font (because sometimes the result is a
plain character instead of an HTML entity string - and sometimes the
result is a semicolon...).

What is the best way to get the "real" Unicode points for the exotic
fonts in Mathematica?

[1] http://www.unicode.org/charts/PDF/U2100.pdf

-- 
http://chris.chiasson.name/


  • Prev by Date: plot question
  • Next by Date: Re: Re: will someone explain the behavior of Unevaluated in this example?
  • Previous by thread: Re: plot question
  • Next by thread: Re: a (serious) question about character codes in Mathematica