Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using ReadList to read a string

  • To: mathgroup at smc.vnet.net
  • Subject: [mg83905] Re: [mg83781] Using ReadList to read a string
  • From: "Igor C. Antonio" <igora at wolf-ram.com>
  • Date: Tue, 4 Dec 2007 04:22:14 -0500 (EST)
  • Organization: Wolfram Research, Inc.
  • References: <200711301023.FAA06237@smc.vnet.net>
  • Reply-to: igora at wolf-ram.com

Donald DuBois wrote:
> Hello,
> 
> I am trying to get ReadList to read a string in a text file  (filename.txt).
> 
> I would like NOT to have use Import because it is MUCH slower in reading
> a text file than ReadList is.  For example:
> 
> (1) a file with 50,000 records can be created
> (2) Exported  to disk 
> (3) read by ReadList[...] and 
> (4) read by Import[...]
> 
> 
> dataFile1 = 
>  Table[{2001, "nameA", "symbolA", 
>    15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"];
> 
> AbsoluteTiming[
>  out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];]
> 
> AbsoluteTiming[out1Import = Import["out1.txt", "Table"];]
> 
> {0.1718750, Null}
> 
> {2.4375000, Null}
> 
> Import takes 14 times longer to read in the same file as compared to ReadList.
> So, naturally, I would like to use ReadList whenever I have a .txt file to be read in from disk.
> 
> However, the file to be read  is slightly more complicated than the one above (out1.txt).  
> There is a string that is added to the file as the second element of a record.
> The first few records of the file (EWZ2.TXT below)  look like the following with each record
> consisting of eight elements: a number, string, word followed by five integer numbers for each record.  
> Each record is on a separate line.  
> 
> EWZ2.TXT:
> 
> 20000714 "iShares MSCI Brazil Index" EWZ      250        1627        1637        1627        1637
> 20000717 "iShares MSCI Brazil Index" EWZ      100        1730        1735        1730        1735
> 20000718 "iShares MSCI Brazil Index" EWZ      100        1730        1730        1730        1730
> 20000719 "iShares MSCI Brazil Index" EWZ      100        1686        1686        1686        1686
> 20000720 "iShares MSCI Brazil Index" EWZ       50        1724        1724        1724        1724
> 

Import is naturally slower than ReadList as it processes the data for you.  It 
parses dates, strings, numbers, currency, etc to their mathematica equivalent. 
With that said, we have made speed improvements in Import as Table and they will 
be in the next minor update to Mathematica.
-------------------
In[10]:= dataFile1 =
  Table[{2001, "nameA", "symbolA",
    15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"];

In[11]:= a1 =
  AbsoluteTiming[
   out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];]

Out[11]= {0.2031263, Null}

In[12]:= a2 =
  AbsoluteTiming[out1Import = Import["out1.txt", "Table"];]

Out[12]= {0.4843781, Null}

In[13]:= a2[[1]]/a1[[1]]

Out[13]= 2.384615
-------------
*Disclaimer:  speed of Import as Table largely depends on the amount of 
processing Table has to do on the data.  The more numbers, dates, currencies 
they file has, the slower it will be.



You may want to try Import[<file>, "Table", "Numeric"->False], which disables 
the parsing of the data while steal splitting the data correctly and handling 
quotes:

In[70]:= data =Import["donald_short.txt", "Table","Numeric"->False]//InputForm
Out[70]//InputForm=
{{"20000714", "iShares MSCI Brazil Index", "EWZ", "250", "1627", "1637", "1627", 
"1637"},
  {"20000717", "iShares MSCI Brazil Index", "EWZ", "100", "1730", "1735", 
"1730", "1735"},
  {"20000718", "iShares MSCI Brazil Index", "EWZ", "100", "1730", "1730", 
"1730", "1730"},
  {"20000719", "iShares MSCI Brazil Index", "EWZ", "100", "1686", "1686", 
"1686", "1686"},
  {"20000720", "iShares MSCI Brazil Index", "EWZ", "50", "1724", "1724", "1724", 
"1724"}}

You could then post-process the data on your own.

> But, is there anyway to get ReadList to read the above file (EWZ2.TXT) with the string
> as the second item of a record so that the speed advantage of ReadList over Import
> can be retained? 

Most likely you won't be able to import that data correctly in one pass with a 
single ReadList call.  ReadList is a lower-level function than Import and 
doesn't have the ability to parse the quotes out of strings (not Mathematica 
String).  Import as TSV, CSV, and Table process the data as it imports in order 
to handle that case (and many others).

> Don

--
Igor C. Antonio
Software Engineer
Wolfram Research, Inc.
http://www.wolfram.com

To email me personally, remove the dash.


  • Prev by Date: Re: Re: how draw box with open front
  • Next by Date: Re: a definite integral
  • Previous by thread: Re: Using ReadList to read a string
  • Next by thread: Re: Using ReadList to read a string