Re: Unicode Support
- To: mathgroup at smc.vnet.net
- Subject: [mg55590] Re: Unicode Support
- From: dh <dh at metrohm.ch>
- Date: Wed, 30 Mar 2005 03:22:09 -0500 (EST)
- References: <d266ci$c9i$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Hello John, You write: "Saying that Mathematica uses 16-bit Unicode characters is equivalent to saying that Mathematica uses UTF-16. and Mathematica Help: "MathLink strings and symbols can contain characters with codes ranging from 0 to 65535?that is, characters that can be represented by unsigned 16-bit integers." Now, the Unicode standard defines 1'114'111 (hex 10ffff) characters. This is more than what is mentioned in the Help (65535). These 1'114'111 characters, coded in UTF16, need 2 3 or 4 Bytes. Has Wolfram truncated the available characters to thouse that are represented by 2 Bytes in UTF16? Please clarify. Sincerely, Daniel John Fultz wrote: > On Sat, 26 Mar 2005 02:39:43 -0500 (EST), Zhu Chongkai wrote: > >>Hi all, >> >>The Mathematica Book says that Mathematica support Unicode Characters. >>And the MathLink tells that a Unicode character in Mathematica is a >>16-bit. But the latest Unicode Standard uses 32-bit to encode a >>character. It seems to me that Mathematica's Unicode support is >>outdated, based on an old version of Unicode Standard, which only >>contains lass than 65536 characters. Will next version of Mathematica >>use 32-bit encoding? Or am I wrong? >> >>Cheers, >>Zhu Chongkai >>http://www.neilvandyke.org/mrmathematica/ > > > Saying that Mathematica uses 16-bit Unicode characters is equivalent to > saying that Mathematica uses UTF-16. UTF-16 can represent any Unicode > character, and has been able to do so since at least Unicode 2.0 (and quite > possibly earlier). It does so by using a reserved block of 16-bit values > to represent non-plane 0 Unicode characters as a pair of values (known as a > surrogate pair...see section 5.4 of the Unicode standard for more info). > So, there is no need to change from a 16-bit encoding in order to support > characters outside of the plane 0 range. > > MathLink supports this now. It's still just a stream of 16-bit characters. > Mathematica can also represent the characters as surrogate pairs, but > doesn't yet treat them as unitary characters for the purpose of string > manipulation and text drawing operations. That's something we'll add to a > future release. > > Sincerely, > > John Fultz > jfultz at wolfram.com > User Interface Group > Wolfram Research, Inc. > >