MathGroup Archive: December 2007 [00027]

[Date Index] [Thread Index] [Author Index]

Re: Using ReadList to read a string

To: mathgroup at smc.vnet.net
Subject: [mg83819] Re: Using ReadList to read a string
From: Bill Rowe <readnewsciv at sbcglobal.net>
Date: Sat, 1 Dec 2007 05:47:07 -0500 (EST)

On 11/30/07 at 5:23 AM, donabc at comcast.net (Donald DuBois) wrote:

>I am trying to get ReadList to read a string in a text file (filename.txt).

>I would like NOT to have use Import because it is MUCH slower in
>reading a text file than ReadList is.  For example:

There is a good reason for Import being slower than ReadList.
Import is designed to work with complex data structures and
recognize strings from numbers automatically. The extra
computation needed to do this is why Import is slower.

<snip>

>EWZ2.TXT:

20000714 "iShares MSCI Brazil Index" EWZ      250       
1627        1637        1627        1637
20000717 "iShares MSCI Brazil Index" EWZ      100       
1730        1735        1730        1735
20000718 "iShares MSCI Brazil Index" EWZ      100       
1730        1730        1730        1730
20000719 "iShares MSCI Brazil Index" EWZ      100       
1686        1686        1686        1686
20000720 "iShares MSCI Brazil Index" EWZ       50       
1724        1724        1724        1724

>The format of the above file is: {Number, String, Word, Number,
>Number, Number, Number, Number}

There are a several ways to approach this problem. One set of
approaches is to read the data as strings or records then use
Mathematica to convert those to the desired data types: For example,

In[19]:= data =
   StringSplit[#, "\""] & /@ ReadList["test.txt", String];
Flatten[{ToExpression[First@#], #[[2]],
     StringSplit[#[[3]], Whitespace][[1]],
     ToExpression /@ Rest[StringSplit[#[[3]], Whitespace]]}] &
/@ data

Out[20]= (\[NoBreak]
20000714    iShares MSCI Brazil Index   EWZ 250 1627    1637   
1627    1637
20000717    iShares MSCI Brazil Index   EWZ 100 1730    1735   
1730    1735
20000718    iShares MSCI Brazil Index   EWZ 100 1730    1730   
1730    1730
20000719    iShares MSCI Brazil Index   EWZ 100 1686    1686   
1686    1686
20000720    iShares MSCI Brazil Index   EWZ 50  1724    1724   
1724    1724

\[NoBreak])

does the trick.

Alternatively,

data=ReadList["test.txt", {Number, Word, Word, Word, Word, Word, Number,
   Number, Number, Number, Number}];
=46latten{First@#,StringJoin@@Take[#,{2,5}],Drop[#,6]}&/@data

will also work.

You might also be able to get ReadList to do everything by with
the appropriate TokenWords list and RecordSeparators.

But notice what is happening here. The time saved by being able
to read the file quickly is being consumed by post processing
the data to get it in the form you want. Additionally, there is
your time getting things to work and verifying they do work.

>dataFile1 = Table[{2001, "nameA", "symbolA", 15.5}, {50000}];
>Export["out1.txt", dataFile1, "Table"];
>
>AbsoluteTiming[
>out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];]
>
>AbsoluteTiming[out1Import = Import["out1.txt", "Table"];]
>
>{0.1718750, Null}
>
>{2.4375000, Null}

Yes your example shows a 14x improvement in speed for ReadList
over Import. But note the absolute difference is only a bit more
than 2 seconds. Unless you are going to read numerous files with
the same format, it clearly costs you far more time to get
ReadList to do what you want than is saved. And for file sizes
on the order of 50,000 records, the post processing I am doing
to make things work combined with the time ReadList takes to
read the file, likely is more than the time Import would have
taken in the first place.

BTW, if you really are working with many large files where the
data originates in Mathematica, consider using Put to write the
data out as a Mathematica expression and reading it back with
Get. These will usually be faster than ReadList and take much
less thought to use. The disadvantage of this approach is the
file created by Put will require a lot of work to use outside of Mathematic=
a.
--
To reply via email subtract one hundred and four

Prev by Date: Re: Simplifying an expression - with my own definition of what "simple"

Next by Date: Re: how draw box with open front

Previous by thread: Re: Simplifying an expression - with my own definition of what "simple"

Next by thread: Re: Using ReadList to read a string