MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using ReadList to read a string

  • To: mathgroup at smc.vnet.net
  • Subject: [mg83800] Re: [mg83781] Using ReadList to read a string
  • From: "Thomas Dowling" <thomasgdowling at gmail.com>
  • Date: Sat, 1 Dec 2007 05:37:01 -0500 (EST)
  • References: <200711301023.FAA06237@smc.vnet.net>

Hello,

I am interested in your problem, but unfortunately I do not have a solution,
only a similar experience. I would be very interested to know, however, what
I am doing wrong.  The following are my own observations:

1.  If the data is tab-delimited (that it a tab between each element and a
paragraph mark
indicating end of record), there is not a problem.

For example,

list22 = ReadList[
  "/EWZ22.txt", {Number, Word, Word, Number, Number, Number, Number,
   Number}]

gives the following output:

{{20000714, "\"iSharesMSCIBrazilIndex\"", "EWZ", 250, 1627, 1637,
  1627, 1637}, {20000717, "\"iSharesMSCIBrazilIndex\"", "EWZ", 100,
  1730, 1735, 1730, 1735}, {20000718, "\"iSharesMSCIBrazilIndex\"",
  "EWZ", 100, 1730, 1730, 1730, 1730}, {20000719,
  "\"iSharesMSCIBrazilIndex\"", "EWZ", 100, 1686, 1686, 1686,
  1686}, {20000720, "\"iSharesMSCIBrazilIndex\"", "EWZ", 50, 1724,
  1724, 1724, 1724}}

where 'EWZ22.txt, is tab-delimited (and saved as text file)

and

Map[Head, list22, {2}]

gives the following:


{{Integer, String, String, Integer, Integer, Integer, Integer,
  Integer}, {Integer, String, String, Integer, Integer, Integer,
  Integer, Integer}, {Integer, String, String, Integer, Integer,
  Integer, Integer, Integer}, {Integer, String, String, Integer,
  Integer, Integer, Integer, Integer}, {Integer, String, String,
  Integer, Integer, Integer, Integer, Integer}}


All is well.


However, with comma-delimited text (EWZ2.txt) and the following command,

ReadList["/EWZ2.txt", {Number, Word, Word, Number, Number, Number,
  Number, Number}]

I get the following output:

Read::readn: Invalid real number found when reading from /EWZ2.txt. >>


{{20000714, ",\"iSharesMSCIBrazilIndex\",EWZ,250,1627,1637,1627,1637",
   "20000717,\"iSharesMSCIBrazilIndex\",EWZ,100,1730,1735,1730,1735",
  20000718, $Failed, EndOfFile, EndOfFile, EndOfFile}}


You can, of course, read the file as a string,

list3 = ReadList["/EWZ2.txt", String ]

but this is not what is desired:

Map[Head, list3]

{String, String, String, String, String}

The reason I am interested is that the same problem seem to occur with, say,
{x, time} data
from a recording device where x and time are Numbers.

Reading a tab-delimited text file (datatab.txt) with the following command

 ReadList["/datatab.txt", {Number, Number}]

gives the following output

{{1.24, 0.00161925}, {1.25, 0.00162431}, {1.26, 0.00161994}, {1.27,
  0.00161719}, {1.28, 0.00161219}, {1.29, 0.00160894}, {1.3,
  0.00161663}, {1.31, 0.00161956}, {1.32, 0.00162194}, {1.33,
  0.00161781}, {1.34, 0.001615}, {1.35, 0.00160962}, {1.36,
  0.00161806}, {1.37, 0.00162575}, {1.38, 0.00162256}, {1.39,
  0.00161581}, {1.4, 0.00161575}, {1.41, 0.00160694}, {1.42,
  0.00161869}, {1.43, 0.00161644}, {1.44, 0.00162231}, {1.45,
  0.00161681}, {1.46, 0.00161812}, {1.47, 0.00160969}, {1.48,
  0.00161875}, {1.49, 0.00162512}, {1.5, 0.00162319}, {1.51,
  0.0016135}, {1.52, 0.00161856}, {1.53, 0.00161231}, {1.54,
  0.00161887}}

BUT ...

reading the same data which has been converted to comma-delimited format

(and saved as text as datacom.txt) gives the following

ReadList["/datacom.txt", {Number, Number}]

Read::readn: Invalid real number found when reading from \
/datacom.txt. >>

{{1.24, $Failed}}


I would be very interested in any suggestions.  Experimenting with the
Options for ReadList, such as RecordSeparators and WordSeparators, does not
seem to work, at least for me.  Although files may be saved as
tab-delimited, it is very easy to forget, and with large files conversion
takes quite a bit of time.

Sorry to be so long-winded!

Thanks for your help

Thomas Dowling.


On Nov 30, 2007 10:23 AM, Donald DuBois <donabc at comcast.net> wrote:

> Hello,
>
> I am trying to get ReadList to read a string in a text file  (filename.txt
> ).
>
> I would like NOT to have use Import because it is MUCH slower in reading
> a text file than ReadList is.  For example:
>
> (1) a file with 50,000 records can be created
> (2) Exported  to disk
> (3) read by ReadList[...] and
> (4) read by Import[...]
>
>
> dataFile1 =
>  Table[{2001, "nameA", "symbolA",
>   15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"];
>
> AbsoluteTiming[
>  out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];]
>
> AbsoluteTiming[out1Import = Import["out1.txt", "Table"];]
>
> {0.1718750, Null}
>
> {2.4375000, Null}
>
> Import takes 14 times longer to read in the same file as compared to
> ReadList.
> So, naturally, I would like to use ReadList whenever I have a .txt file to
> be read in from disk.
>
> However, the file to be read  is slightly more complicated than the one
> above (out1.txt).
> There is a string that is added to the file as the second element of a
> record.
> The first few records of the file (EWZ2.TXT below)  look like the
> following with each record
> consisting of eight elements: a number, string, word followed by five
> integer numbers for each record.
> Each record is on a separate line.
>
> EWZ2.TXT:
>
> 20000714 "iShares MSCI Brazil Index" EWZ      250        1627        1637
>        1627        1637
> 20000717 "iShares MSCI Brazil Index" EWZ      100        1730        1735
>        1730        1735
> 20000718 "iShares MSCI Brazil Index" EWZ      100        1730        1730
>        1730        1730
> 20000719 "iShares MSCI Brazil Index" EWZ      100        1686        1686
>        1686        1686
> 20000720 "iShares MSCI Brazil Index" EWZ       50        1724        1724
>        1724        1724
>
> The format of the above file is: {Number, String, Word, Number, Number,
> Number, Number, Number}
>
> If this file on disk is named "EWZ2.TXT" I am not able to use ReadList to
> read it.
> I use two format specifications within ReadList
> and neither of them works:
>
> {Number, String, Word, Number, Number, Number, Number, Number}
> and {Number, Word, Word, Number, Number, Number, Number, Number}.
>
> ReadList["EWZ2.TXT", {Number, String, Word, Number, Number, Number,
>  Number, Number}]
> ReadList["EWZ2.TXT", {Number, Word, Word, Number, Number, Number,
>  Number, Number}]
>
>
>
> {{20000714,
>  " \"iShares MSCI Brazil Index\" EWZ      250        1627        \
> 1637        1627        1637", "20000717", $Failed, EndOfFile,
>  EndOfFile, EndOfFile, EndOfFile}}
>
> {{20000714, "\"iShares", "MSCI", $Failed, EndOfFile, EndOfFile,
>  EndOfFile, EndOfFile}}
>
>
> Using "String" for the format of the second element seems to have more
> success than "Word" but, when
> read, none of the elements is separated by a comma as happened when using
> ReadList to read
> out1.txt above.
>
> "iShares MSCI Brazil Index" should be the second element of a sublist
> within
> the entire list (Table) and EWZ  (with or without quotes) should be the
> third element
> within a sublist.
>
> The defintion of a String in the function description for ReadList is
> "string terminated by a newline" which does not describe the above file.
>  (EWZ2.TXT).  If the string is moved in the file so that it is the last
> item
>  in any record, such as
>
>  20000714  EWZ      250        1627        1637        1627        1637
>  "iShares MSCI Brazil Index"
>
> then a format of {Number, Word, Number, Number, Number, Number, Number,
> String}
> in ReadList DOES work to read the file correclty.
>
> But, is there anyway to get ReadList to read the above file (EWZ2.TXT)
> with the string
> as the second item of a record so that the speed advantage of ReadList
> over Import
> can be retained?
>
> Or is there some other function I should be using other than Import and/or
> ReadList?
>
> Thank you in advance for any help you can give me.
> Don
>
>



  • Prev by Date: Fitting coupled differential equations to experimental data
  • Next by Date: Re: Using ReadList to read a string
  • Previous by thread: Re: Using ReadList to read a string
  • Next by thread: Re: Using ReadList to read a string