Re: Re: Unicode Support
- To: mathgroup at smc.vnet.net
- Subject: [mg55551] Re: [mg55525] Re: Unicode Support
- From: John Fultz <jfultz at wolfram.com>
- Date: Tue, 29 Mar 2005 03:42:33 -0500 (EST)
- Reply-to: jfultz at wolfram.com
- Sender: owner-wri-mathgroup at wolfram.com
On Mon, 28 Mar 2005 02:42:03 -0500 (EST), Zhu Chongkai wrote: > ======= At 2005-03-27, 18:42:43 John Fultz wrote: ======= > >> UTF-8 is a supported character encoding by both front end and kernel > (i.e. >> they can import and export files as UTF-8). I believe the support has > been >> there since 5.0. MathLink only supports UTF-16 for now. >> >> UTF-32 is not supported at all in current versions. To be honest, > nobody >> has asked for it before. While UTF-32 is a clean way of representing >> characters in all of the Unicode planes, I think the vast majority of >> programs out there in the real world are using either UTF-8 or UTF-16. >> >> Sincerely, >> John Fultz >> > Thank you for clarification. But I still think that both UTF-8 and UTF-32 > support are important, especially for MathLink. For example, I know one > program that use both UTF-8 and UTF-32, how can I link it with MathLink? I imagine that we'll do something with UTF-8 in MathLink down the road. As for UTF-32...well, it's been asked for once...by you...and you yourself admit that you could use UTF-8 instead. So, thusfar, it's just not a big priority. Of course, in an evolving world, priorities may change. Concerning your problem...if the program also uses UTF-16 (I would find it difficult to believe a program would support UTF-8 and UTF-32 with no support for UTF-16), then just use that. Otherwise, writing the code to convert plane 0-only characters (it's unlikely you'll see anything outside of plane 0) between UTF-32 and UTF-16 is about as trivial of a programming exercise as you can get...it's just converting an array of 32-bit unsigned ints to an array of 16-bit unsigned shorts and vice versa. Writing a UTF-8 <-> UTF-16 converter is a little harder, but all of the information needed to do it is in section 3.9 of the Unicode spec found online at the unicode.org site. Or you can just google for somebody that has code to do it. For example, http://www-306.ibm.com/software/globalization/icu/index.jsp http://www.gnu.org/software/libiconv/ Sincerely, John Fultz jfultz at wolfram.com User Interface Group Wolfram Research, Inc.