|
[Date Index]
[Thread Index]
[Author Index]
Re: Import, ReadList, and Unicode
- To: mathgroup at smc.vnet.net
- Subject: [mg115223] Re: Import, ReadList, and Unicode
- From: eros olmi <erosolmiz at hotmail.com>
- Date: Tue, 4 Jan 2011 04:29:11 -0500 (EST)
You are right Hans, i should use UTF8 without dash, thank you
i am using windows xp
eros
> From: hmichel at cox.net
> To: erosolmiz at hotmail.com; mathgroup at smc.vnet.net
> Subject: RE: [mg115151] Import, ReadList, and Unicode
> Date: Mon, 3 Jan 2011 01:59:25 -0600
>
> EO:
>
> Without more information it becomes difficult examine your problem.
>
> To set the CharacterEncoding and SystemCharacterEncoding for UTF-8 the
> string value is "UTF8" no dash.
> I will assume this is what you did and writing about it now is just an
> oversight.
>
> Nevertheless, what other items that may be the source of your phenomena?
>
> Without knowing what OS you are using I would say that you may have a Byte
> Order Mark (BOM) issue.
>
> http://msdn.microsoft.com/en-us/library/dd374101(VS.85).aspx
>
> Some applications add a BOM to the beginning of a file or stream. It is up
> to the consuming application to know how to handle the BOM. So including a
> BOM is not technically wrong.
>
> Can you provide more information on the file for example how it was saved,
> what structure are your expecting, what are the RecordSeperators (default)?
>
> Hans
>
> -----Original Message-----
> From: eros olmi [mailto:erosolmiz at hotmail.com]
> Sent: Sunday, January 02, 2011 5:23 AM
> To: mathgroup at smc.vnet.net
> Subject: [mg115151] Import, ReadList, and Unicode
>
> In Mathematica v8 i am using this convoluted way to read the contents of a
> unicode file saved in utf-8 format
> txt = Import["file.txt",CharacterEncoding -> "UTF-8"]
> w = ReadList[StringToStream[txt], Record, RecordLists -> True]
> the output like this:
> {{unicode chars},{unicode chars},{unicode chars}}
> the letters displayed correctly even if i don't use CharacterEncoding ->
> "UTF-8"
> but using
> ReadList["file.txt", Record]
> will return the file as a garbage characters , and setting
> $SystemCharacterEncoding = "UTF-8"
> $CharacterEncoding = $SystemCharacterEncoding
> does not cure the problem since ReadList can't accept CharacterEncoding ->
> "UTF-8" in its syntax unlike Import.
> are there some cure to this phenomena.
> thanks
> eros
>
Prev by Date:
Re: How to change the directory for the docs?
Next by Date:
Re: Using Extract where some indices are out of bounds (efficiently)
Previous by thread:
Re: Import, ReadList, and Unicode
Next by thread:
ReadList - Import - and Unicode
|