MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: a (serious) question about character codes in Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg71485] Re: a (serious) question about character codes in Mathematica
  • From: "Philipp" <Philipp.M.O at gmail.com>
  • Date: Mon, 20 Nov 2006 06:34:27 -0500 (EST)
  • References: <ejoskp$n3b$1@smc.vnet.net>

Chris,

I suppose you can redefine, if you've got the patience, all Mathematica
encoding files in
...\Wolfram Research\Mathematica\5.2\SystemFiles\CharacterEncodings.
(or wherever they sit on your system)

Regarding Mathematica7.m, create, say, Mathematica7unicode.m, with

(* Math7unicode *)

{"16Bit",
{
{8477, "\[DoubleStruckCapitalR]"}
}
}

(Be careful with prettifying the list; I found syntax checking on
CharacterEncoding files a bit twitchy)

Now,

In[1]:=FromCharacterCode[8477, "Mathematica7unicode"] // FullForm
Out[1]//FullForm="\[DoubleStruckCapitalR]"

and,

In[2]:=ToCharacterCode["\[DoubleStruckCapitalR]","Mathematica7unicode"]
Out[2]={8477}

I hope this helps.
Cheers,
Philipp


Chris Chiasson wrote:
> Usually, my questions about character codes are more complaints than
> anything else.
>
> But, this time I just want info. I promise :-)
>
>
>
> FromCharacterCode[16^^52,"Mathematica7"]//FullForm
>
> gives
>
> "\[DoubleStruckCapitalR]"
>
> (52 in hex is 82 in decimal, so this is the 83rd character in the
> Mathematica7 font - numbering starts from zero)
>
> As shown below, Mathematica seems to map characters to non-standard
> Unicode points.
>
> In Unicode, the double struck capitol R is 0x211d in hex and 8477 in decimal[1].
>
> But, Mathematica assigns this character to (the second argument is
> optional here)
>
> ToCharacterCode["\[DoubleStruckCapitalR]","Unicode"]
>
> {63413}
>
> 63413 in decimal and 0xf7b5 in hex does not map to any (regular)
> glyph. In fact, the number is in the private use area of Unicode.
>
> I know Mathematica has been around for a long time. Perhaps this code
> point was in use for DoubleStruckCapitalR before that glyph had a code
> point in regular Unicode. I am sure there is a perfectly good
> explanation.
>
> I am interested in ways to acquire the common Unicode points for the
> glyphs in Mathematica7 along with the stretchy characters for
> parenthesis and lists from other Mathematica fonts. This can be done
> by feeding the raw character into one of the lower level conversion
> routines like
>
> System`Convert`XMLDump`determineEntityExportFunction[{"\[DoubleStruckCapitalR]"},"US-ASCII"]["\[DoubleStruckCapitalR]"]
>
> (I think this call is right, but I can't be sure because it's
> undocumented - also, it must be enabled by exporting something first)
>
> but it requires a lot of post processing to apply it to all the
> characters in a particular font (because sometimes the result is a
> plain character instead of an HTML entity string - and sometimes the
> result is a semicolon...).
>
> What is the best way to get the "real" Unicode points for the exotic
> fonts in Mathematica?
>
> [1] http://www.unicode.org/charts/PDF/U2100.pdf
> 
> -- 
> http://chris.chiasson.name/


  • Prev by Date: Best practice for naming of options
  • Next by Date: returning variable number of arguments from a Module[ ]
  • Previous by thread: a (serious) question about character codes in Mathematica
  • Next by thread: Converting an expression to a list of terms?