Re: Unicode Support
- To: mathgroup at smc.vnet.net
- Subject: [mg55590] Re: Unicode Support
- From: dh <dh at metrohm.ch>
- Date: Wed, 30 Mar 2005 03:22:09 -0500 (EST)
- References: <email@example.com>
- Sender: owner-wri-mathgroup at wolfram.com
"Saying that Mathematica uses 16-bit Unicode characters is equivalent to
saying that Mathematica uses UTF-16.
and Mathematica Help:
"MathLink strings and symbols can contain characters with codes ranging
from 0 to 65535?that is, characters that can be represented by unsigned
Now, the Unicode standard defines 1'114'111 (hex 10ffff) characters.
This is more than what is mentioned in the Help (65535). These 1'114'111
characters, coded in UTF16, need 2 3 or 4 Bytes.
Has Wolfram truncated the available characters to thouse that are
represented by 2 Bytes in UTF16?
John Fultz wrote:
> On Sat, 26 Mar 2005 02:39:43 -0500 (EST), Zhu Chongkai wrote:
>>The Mathematica Book says that Mathematica support Unicode Characters.
>>And the MathLink tells that a Unicode character in Mathematica is a
>>16-bit. But the latest Unicode Standard uses 32-bit to encode a
>>character. It seems to me that Mathematica's Unicode support is
>>outdated, based on an old version of Unicode Standard, which only
>>contains lass than 65536 characters. Will next version of Mathematica
>>use 32-bit encoding? Or am I wrong?
> Saying that Mathematica uses 16-bit Unicode characters is equivalent to
> saying that Mathematica uses UTF-16. UTF-16 can represent any Unicode
> character, and has been able to do so since at least Unicode 2.0 (and quite
> possibly earlier). It does so by using a reserved block of 16-bit values
> to represent non-plane 0 Unicode characters as a pair of values (known as a
> surrogate pair...see section 5.4 of the Unicode standard for more info).
> So, there is no need to change from a 16-bit encoding in order to support
> characters outside of the plane 0 range.
> MathLink supports this now. It's still just a stream of 16-bit characters.
> Mathematica can also represent the characters as surrogate pairs, but
> doesn't yet treat them as unitary characters for the purpose of string
> manipulation and text drawing operations. That's something we'll add to a
> future release.
> John Fultz
> jfultz at wolfram.com
> User Interface Group
> Wolfram Research, Inc.
Prev by Date:
Re: Simple Sum does not simplify
Next by Date:
nintegrate vs nintegrateinterpolatingfunction vs integrate
Previous by thread:
Re: Re: Unicode Support
Next by thread:
Re: Re: Unicode Support