MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Import, ReadList, and Unicode

  • To: mathgroup at
  • Subject: [mg115223] Re: Import, ReadList, and Unicode
  • From: eros olmi <erosolmiz at>
  • Date: Tue, 4 Jan 2011 04:29:11 -0500 (EST)

You are right  Hans, i should use UTF8  without dash, thank you
i am using windows xp
> From: hmichel at
> To: erosolmiz at; mathgroup at
> Subject: RE: [mg115151] Import, ReadList, and Unicode
> Date: Mon, 3 Jan 2011 01:59:25 -0600
> EO:
> Without more information it becomes difficult examine your problem. 
> To set the CharacterEncoding and SystemCharacterEncoding for UTF-8 the
> string value is "UTF8" no dash.
> I will assume this is what you did and writing about it now is just an
> oversight.
> Nevertheless, what other items that may be the source of your phenomena?
> Without knowing what OS you are using I would say that you may have a Byte
> Order Mark (BOM) issue.
> Some applications add a BOM to the beginning of a file or stream. It is up
> to the consuming application to know how to handle the BOM. So including a
> BOM is not technically wrong.
> Can you provide more information on the file for example how it was saved,
> what structure are your expecting, what are the RecordSeperators (default)?
> Hans
> -----Original Message-----
> From: eros olmi [mailto:erosolmiz at] 
> Sent: Sunday, January 02, 2011 5:23 AM
> To: mathgroup at
> Subject: [mg115151] Import, ReadList, and Unicode
> In Mathematica v8 i am using this convoluted way to read the contents of a
> unicode file saved in utf-8 format
> txt = Import["file.txt",CharacterEncoding -> "UTF-8"]
> w = ReadList[StringToStream[txt], Record, RecordLists -> True]
> the output like this:
> {{unicode chars},{unicode chars},{unicode chars}}
> the letters displayed correctly even if i don't use CharacterEncoding ->
> "UTF-8"
> but using
> ReadList["file.txt", Record]
> will return the file as a garbage characters , and setting
> $SystemCharacterEncoding = "UTF-8"
> $CharacterEncoding = $SystemCharacterEncoding
> does not cure the problem since ReadList can't accept CharacterEncoding ->
> "UTF-8" in its syntax unlike Import.
> are there some cure to this phenomena.
> thanks
> eros

  • Prev by Date: Re: How to change the directory for the docs?
  • Next by Date: Re: Using Extract where some indices are out of bounds (efficiently)
  • Previous by thread: Re: Import, ReadList, and Unicode
  • Next by thread: ReadList - Import - and Unicode