MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Unicode Support

  • To: mathgroup at smc.vnet.net
  • Subject: [mg55590] Re: Unicode Support
  • From: dh <dh at metrohm.ch>
  • Date: Wed, 30 Mar 2005 03:22:09 -0500 (EST)
  • References: <d266ci$c9i$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Hello John,
You write:
"Saying that Mathematica uses 16-bit Unicode characters is equivalent to
  saying that Mathematica uses UTF-16.
and Mathematica Help:
"MathLink strings and symbols can contain characters with codes ranging 
from 0 to 65535?that is, characters that can be represented by unsigned 
16-bit integers."

Now, the Unicode standard defines 1'114'111 (hex 10ffff) characters. 
This is more than what is mentioned in the Help (65535). These 1'114'111 
characters, coded in UTF16, need 2 3 or 4 Bytes.

Has Wolfram truncated the available characters to thouse that are 
represented by 2 Bytes in UTF16?

Please clarify.

Sincerely, Daniel


John Fultz wrote:
>  On Sat, 26 Mar 2005 02:39:43 -0500 (EST), Zhu Chongkai wrote:
> 
>>Hi all,
>>
>>The Mathematica Book says that Mathematica support Unicode Characters.
>>And the MathLink tells that a Unicode character in Mathematica is a
>>16-bit. But the latest Unicode Standard uses 32-bit to encode a
>>character. It seems to me that Mathematica's Unicode support is
>>outdated, based on an old version of Unicode Standard, which only
>>contains lass than 65536 characters. Will next version of Mathematica
>>use 32-bit encoding? Or am I wrong?
>>
>>Cheers,
>>Zhu Chongkai
>>http://www.neilvandyke.org/mrmathematica/
> 
> 
> Saying that Mathematica uses 16-bit Unicode characters is equivalent to 
> saying that Mathematica uses UTF-16.  UTF-16 can represent any Unicode 
> character, and has been able to do so since at least Unicode 2.0 (and quite 
> possibly earlier).  It does so by using a reserved block of 16-bit values 
> to represent non-plane 0 Unicode characters as a pair of values (known as a 
> surrogate pair...see section 5.4 of the Unicode standard for more info).  
> So, there is no need to change from a 16-bit encoding in order to support 
> characters outside of the plane 0 range.
> 
> MathLink supports this now.  It's still just a stream of 16-bit characters. 
> Mathematica can also represent the characters as surrogate pairs, but 
> doesn't yet treat them as unitary characters for the purpose of string 
> manipulation and text drawing operations.  That's something we'll add to a 
> future release.
> 
> Sincerely,
> 
> John Fultz
> jfultz at wolfram.com
> User Interface Group
> Wolfram Research, Inc.
> 
> 


  • Prev by Date: Re: Simple Sum does not simplify
  • Next by Date: nintegrate vs nintegrateinterpolatingfunction vs integrate
  • Previous by thread: Re: Re: Unicode Support
  • Next by thread: Re: Re: Unicode Support