Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Import, ReadList, and Unicode

  • To: mathgroup at smc.vnet.net
  • Subject: [mg115223] Re: Import, ReadList, and Unicode
  • From: eros olmi <erosolmiz at hotmail.com>
  • Date: Tue, 4 Jan 2011 04:29:11 -0500 (EST)

You are right  Hans, i should use UTF8  without dash, thank you
i am using windows xp
eros
 
> From: hmichel at cox.net
> To: erosolmiz at hotmail.com; mathgroup at smc.vnet.net
> Subject: RE: [mg115151] Import, ReadList, and Unicode
> Date: Mon, 3 Jan 2011 01:59:25 -0600
> 
> EO:
> 
> Without more information it becomes difficult examine your problem. 
> 
> To set the CharacterEncoding and SystemCharacterEncoding for UTF-8 the
> string value is "UTF8" no dash.
> I will assume this is what you did and writing about it now is just an
> oversight.
> 
> Nevertheless, what other items that may be the source of your phenomena?
> 
> Without knowing what OS you are using I would say that you may have a Byte
> Order Mark (BOM) issue.
> 
> http://msdn.microsoft.com/en-us/library/dd374101(VS.85).aspx
> 
> Some applications add a BOM to the beginning of a file or stream. It is up
> to the consuming application to know how to handle the BOM. So including a
> BOM is not technically wrong.
> 
> Can you provide more information on the file for example how it was saved,
> what structure are your expecting, what are the RecordSeperators (default)?
> 
> Hans
> 
> -----Original Message-----
> From: eros olmi [mailto:erosolmiz at hotmail.com] 
> Sent: Sunday, January 02, 2011 5:23 AM
> To: mathgroup at smc.vnet.net
> Subject: [mg115151] Import, ReadList, and Unicode
> 
> In Mathematica v8 i am using this convoluted way to read the contents of a
> unicode file saved in utf-8 format
> txt = Import["file.txt",CharacterEncoding -> "UTF-8"]
> w = ReadList[StringToStream[txt], Record, RecordLists -> True]
> the output like this:
> {{unicode chars},{unicode chars},{unicode chars}}
> the letters displayed correctly even if i don't use CharacterEncoding ->
> "UTF-8"
> but using
> ReadList["file.txt", Record]
> will return the file as a garbage characters , and setting
> $SystemCharacterEncoding = "UTF-8"
> $CharacterEncoding = $SystemCharacterEncoding
> does not cure the problem since ReadList can't accept CharacterEncoding ->
> "UTF-8" in its syntax unlike Import.
> are there some cure to this phenomena.
> thanks
> eros
> 
 		 	   		  


  • Prev by Date: Re: How to change the directory for the docs?
  • Next by Date: Re: Using Extract where some indices are out of bounds (efficiently)
  • Previous by thread: Re: Import, ReadList, and Unicode
  • Next by thread: ReadList - Import - and Unicode