|
[Date Index]
[Thread Index]
[Author Index]
Re: Unicode Support
- To: mathgroup at smc.vnet.net
- Subject: [mg55590] Re: Unicode Support
- From: dh <dh at metrohm.ch>
- Date: Wed, 30 Mar 2005 03:22:09 -0500 (EST)
- References: <d266ci$c9i$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Hello John,
You write:
"Saying that Mathematica uses 16-bit Unicode characters is equivalent to
saying that Mathematica uses UTF-16.
and Mathematica Help:
"MathLink strings and symbols can contain characters with codes ranging
from 0 to 65535?that is, characters that can be represented by unsigned
16-bit integers."
Now, the Unicode standard defines 1'114'111 (hex 10ffff) characters.
This is more than what is mentioned in the Help (65535). These 1'114'111
characters, coded in UTF16, need 2 3 or 4 Bytes.
Has Wolfram truncated the available characters to thouse that are
represented by 2 Bytes in UTF16?
Please clarify.
Sincerely, Daniel
John Fultz wrote:
> On Sat, 26 Mar 2005 02:39:43 -0500 (EST), Zhu Chongkai wrote:
>
>>Hi all,
>>
>>The Mathematica Book says that Mathematica support Unicode Characters.
>>And the MathLink tells that a Unicode character in Mathematica is a
>>16-bit. But the latest Unicode Standard uses 32-bit to encode a
>>character. It seems to me that Mathematica's Unicode support is
>>outdated, based on an old version of Unicode Standard, which only
>>contains lass than 65536 characters. Will next version of Mathematica
>>use 32-bit encoding? Or am I wrong?
>>
>>Cheers,
>>Zhu Chongkai
>>http://www.neilvandyke.org/mrmathematica/
>
>
> Saying that Mathematica uses 16-bit Unicode characters is equivalent to
> saying that Mathematica uses UTF-16. UTF-16 can represent any Unicode
> character, and has been able to do so since at least Unicode 2.0 (and quite
> possibly earlier). It does so by using a reserved block of 16-bit values
> to represent non-plane 0 Unicode characters as a pair of values (known as a
> surrogate pair...see section 5.4 of the Unicode standard for more info).
> So, there is no need to change from a 16-bit encoding in order to support
> characters outside of the plane 0 range.
>
> MathLink supports this now. It's still just a stream of 16-bit characters.
> Mathematica can also represent the characters as surrogate pairs, but
> doesn't yet treat them as unitary characters for the purpose of string
> manipulation and text drawing operations. That's something we'll add to a
> future release.
>
> Sincerely,
>
> John Fultz
> jfultz at wolfram.com
> User Interface Group
> Wolfram Research, Inc.
>
>
Prev by Date:
Re: Simple Sum does not simplify
Next by Date:
nintegrate vs nintegrateinterpolatingfunction vs integrate
Previous by thread:
Re: Re: Unicode Support
Next by thread:
Re: Re: Unicode Support
|