MathGroup Archive: March 2005 [00794]

[Date Index] [Thread Index] [Author Index]

Re: Re: Unicode Support

To: mathgroup at smc.vnet.net
Subject: [mg55551] Re: [mg55525] Re: Unicode Support
From: John Fultz <jfultz at wolfram.com>
Date: Tue, 29 Mar 2005 03:42:33 -0500 (EST)
Reply-to: jfultz at wolfram.com
Sender: owner-wri-mathgroup at wolfram.com

 On Mon, 28 Mar 2005 02:42:03 -0500 (EST), Zhu Chongkai wrote:
> ======= At 2005-03-27, 18:42:43 John Fultz wrote: =======
>
>> UTF-8 is a supported character encoding by both front end and kernel
> (i.e.
>> they can import and export files as UTF-8).  I believe the support has
> been
>> there since 5.0.  MathLink only supports UTF-16 for now.
>>
>> UTF-32 is not supported at all in current versions.  To be honest,
> nobody
>> has asked for it before.  While UTF-32 is a clean way of representing
>> characters in all of the Unicode planes, I think the vast majority of
>> programs out there in the real world are using either UTF-8 or UTF-16.
>>
>> Sincerely,
>> John Fultz
>>
> Thank you for clarification. But I still think that both UTF-8 and UTF-32
> support are important, especially for MathLink. For example, I know one
> program that use both UTF-8 and UTF-32, how can I link it with MathLink?

I imagine that we'll do something with UTF-8 in MathLink down the road.  As 
for UTF-32...well, it's been asked for once...by you...and you yourself 
admit that you could use UTF-8 instead.  So, thusfar, it's just not a big 
priority.  Of course, in an evolving world, priorities may change.

Concerning your problem...if the program also uses UTF-16 (I would find it 
difficult to believe a program would support UTF-8 and UTF-32 with no 
support for UTF-16), then just use that.  Otherwise, writing the code to 
convert plane 0-only characters (it's unlikely you'll see anything outside 
of plane 0) between UTF-32 and UTF-16 is about as trivial of a programming 
exercise as you can get...it's just converting an array of 32-bit unsigned 
ints to an array of 16-bit unsigned shorts and vice versa.

Writing a UTF-8 <-> UTF-16 converter is a little harder, but all of the 
information needed to do it is in section 3.9 of the Unicode spec found 
online at the unicode.org site.  Or you can just google for somebody that 
has code to do it.  For example,

http://www-306.ibm.com/software/globalization/icu/index.jsp
http://www.gnu.org/software/libiconv/

Sincerely,

John Fultz
jfultz at wolfram.com
User Interface Group
Wolfram Research, Inc.

Prev by Date: Re: front end complaint (ui design flaw?)

Next by Date: work with graphics and output in a different notebook from the one containing your code

Previous by thread: Re: Unicode Support

Next by thread: Re: Unicode Support