Re: Import, ReadList, and Unicode
- To: mathgroup at smc.vnet.net
- Subject: [mg115223] Re: Import, ReadList, and Unicode
- From: eros olmi <erosolmiz at hotmail.com>
- Date: Tue, 4 Jan 2011 04:29:11 -0500 (EST)
You are right Hans, i should use UTF8 without dash, thank you i am using windows xp eros > From: hmichel at cox.net > To: erosolmiz at hotmail.com; mathgroup at smc.vnet.net > Subject: RE: [mg115151] Import, ReadList, and Unicode > Date: Mon, 3 Jan 2011 01:59:25 -0600 > > EO: > > Without more information it becomes difficult examine your problem. > > To set the CharacterEncoding and SystemCharacterEncoding for UTF-8 the > string value is "UTF8" no dash. > I will assume this is what you did and writing about it now is just an > oversight. > > Nevertheless, what other items that may be the source of your phenomena? > > Without knowing what OS you are using I would say that you may have a Byte > Order Mark (BOM) issue. > > http://msdn.microsoft.com/en-us/library/dd374101(VS.85).aspx > > Some applications add a BOM to the beginning of a file or stream. It is up > to the consuming application to know how to handle the BOM. So including a > BOM is not technically wrong. > > Can you provide more information on the file for example how it was saved, > what structure are your expecting, what are the RecordSeperators (default)? > > Hans > > -----Original Message----- > From: eros olmi [mailto:erosolmiz at hotmail.com] > Sent: Sunday, January 02, 2011 5:23 AM > To: mathgroup at smc.vnet.net > Subject: [mg115151] Import, ReadList, and Unicode > > In Mathematica v8 i am using this convoluted way to read the contents of a > unicode file saved in utf-8 format > txt = Import["file.txt",CharacterEncoding -> "UTF-8"] > w = ReadList[StringToStream[txt], Record, RecordLists -> True] > the output like this: > {{unicode chars},{unicode chars},{unicode chars}} > the letters displayed correctly even if i don't use CharacterEncoding -> > "UTF-8" > but using > ReadList["file.txt", Record] > will return the file as a garbage characters , and setting > $SystemCharacterEncoding = "UTF-8" > $CharacterEncoding = $SystemCharacterEncoding > does not cure the problem since ReadList can't accept CharacterEncoding -> > "UTF-8" in its syntax unlike Import. > are there some cure to this phenomena. > thanks > eros >