Re: Unicode Support
- To: mathgroup at smc.vnet.net
- Subject: [mg55524] Re: [mg55503] Unicode Support
- From: John Fultz <jfultz at wolfram.com>
- Date: Sun, 27 Mar 2005 06:45:09 -0500 (EST)
- Reply-to: jfultz at wolfram.com
- Sender: owner-wri-mathgroup at wolfram.com
On Sat, 26 Mar 2005 02:39:43 -0500 (EST), Zhu Chongkai wrote: > Hi all, > > The Mathematica Book says that Mathematica support Unicode Characters. > And the MathLink tells that a Unicode character in Mathematica is a > 16-bit. But the latest Unicode Standard uses 32-bit to encode a > character. It seems to me that Mathematica's Unicode support is > outdated, based on an old version of Unicode Standard, which only > contains lass than 65536 characters. Will next version of Mathematica > use 32-bit encoding? Or am I wrong? > > Cheers, > Zhu Chongkai > http://www.neilvandyke.org/mrmathematica/ Saying that Mathematica uses 16-bit Unicode characters is equivalent to saying that Mathematica uses UTF-16. UTF-16 can represent any Unicode character, and has been able to do so since at least Unicode 2.0 (and quite possibly earlier). It does so by using a reserved block of 16-bit values to represent non-plane 0 Unicode characters as a pair of values (known as a surrogate pair...see section 5.4 of the Unicode standard for more info). So, there is no need to change from a 16-bit encoding in order to support characters outside of the plane 0 range. MathLink supports this now. It's still just a stream of 16-bit characters. Mathematica can also represent the characters as surrogate pairs, but doesn't yet treat them as unitary characters for the purpose of string manipulation and text drawing operations. That's something we'll add to a future release. Sincerely, John Fultz jfultz at wolfram.com User Interface Group Wolfram Research, Inc.