Re: Import, ReadList, and Unicode
- To: mathgroup at smc.vnet.net
- Subject: [mg115170] Re: Import, ReadList, and Unicode
- From: "Hans Michel" <hmichel at cox.net>
- Date: Mon, 3 Jan 2011 03:57:14 -0500 (EST)
EO: Without more information it becomes difficult examine your problem. To set the CharacterEncoding and SystemCharacterEncoding for UTF-8 the string value is "UTF8" no dash. I will assume this is what you did and writing about it now is just an oversight. Nevertheless, what other items that may be the source of your phenomena? Without knowing what OS you are using I would say that you may have a Byte Order Mark (BOM) issue. http://msdn.microsoft.com/en-us/library/dd374101(VS.85).aspx Some applications add a BOM to the beginning of a file or stream. It is up to the consuming application to know how to handle the BOM. So including a BOM is not technically wrong. Can you provide more information on the file for example how it was saved, what structure are your expecting, what are the RecordSeperators (default)? Hans -----Original Message----- From: eros olmi [mailto:erosolmiz at hotmail.com] Sent: Sunday, January 02, 2011 5:23 AM To: mathgroup at smc.vnet.net Subject: [mg115170] [mg115151] Import, ReadList, and Unicode In Mathematica v8 i am using this convoluted way to read the contents of a unicode file saved in utf-8 format txt = Import["file.txt",CharacterEncoding -> "UTF-8"] w = ReadList[StringToStream[txt], Record, RecordLists -> True] the output like this: {{unicode chars},{unicode chars},{unicode chars}} the letters displayed correctly even if i don't use CharacterEncoding -> "UTF-8" but using ReadList["file.txt", Record] will return the file as a garbage characters , and setting $SystemCharacterEncoding = "UTF-8" $CharacterEncoding = $SystemCharacterEncoding does not cure the problem since ReadList can't accept CharacterEncoding -> "UTF-8" in its syntax unlike Import. are there some cure to this phenomena. thanks eros