Re: Using ReadList to read a string
- To: mathgroup at smc.vnet.net
- Subject: [mg83819] Re: Using ReadList to read a string
- From: Bill Rowe <readnewsciv at sbcglobal.net>
- Date: Sat, 1 Dec 2007 05:47:07 -0500 (EST)
On 11/30/07 at 5:23 AM, donabc at comcast.net (Donald DuBois) wrote:
>I am trying to get ReadList to read a string in a text file (filename.txt).
>I would like NOT to have use Import because it is MUCH slower in
>reading a text file than ReadList is. For example:
There is a good reason for Import being slower than ReadList.
Import is designed to work with complex data structures and
recognize strings from numbers automatically. The extra
computation needed to do this is why Import is slower.
<snip>
>EWZ2.TXT:
20000714 "iShares MSCI Brazil Index" EWZ 250
1627 1637 1627 1637
20000717 "iShares MSCI Brazil Index" EWZ 100
1730 1735 1730 1735
20000718 "iShares MSCI Brazil Index" EWZ 100
1730 1730 1730 1730
20000719 "iShares MSCI Brazil Index" EWZ 100
1686 1686 1686 1686
20000720 "iShares MSCI Brazil Index" EWZ 50
1724 1724 1724 1724
>The format of the above file is: {Number, String, Word, Number,
>Number, Number, Number, Number}
There are a several ways to approach this problem. One set of
approaches is to read the data as strings or records then use
Mathematica to convert those to the desired data types: For example,
In[19]:= data =
StringSplit[#, "\""] & /@ ReadList["test.txt", String];
Flatten[{ToExpression[First@#], #[[2]],
StringSplit[#[[3]], Whitespace][[1]],
ToExpression /@ Rest[StringSplit[#[[3]], Whitespace]]}] &
/@ data
Out[20]= (\[NoBreak]
20000714 iShares MSCI Brazil Index EWZ 250 1627 1637
1627 1637
20000717 iShares MSCI Brazil Index EWZ 100 1730 1735
1730 1735
20000718 iShares MSCI Brazil Index EWZ 100 1730 1730
1730 1730
20000719 iShares MSCI Brazil Index EWZ 100 1686 1686
1686 1686
20000720 iShares MSCI Brazil Index EWZ 50 1724 1724
1724 1724
\[NoBreak])
does the trick.
Alternatively,
data=ReadList["test.txt", {Number, Word, Word, Word, Word, Word, Number,
Number, Number, Number, Number}];
=46latten{First@#,StringJoin@@Take[#,{2,5}],Drop[#,6]}&/@data
will also work.
You might also be able to get ReadList to do everything by with
the appropriate TokenWords list and RecordSeparators.
But notice what is happening here. The time saved by being able
to read the file quickly is being consumed by post processing
the data to get it in the form you want. Additionally, there is
your time getting things to work and verifying they do work.
>dataFile1 = Table[{2001, "nameA", "symbolA", 15.5}, {50000}];
>Export["out1.txt", dataFile1, "Table"];
>
>AbsoluteTiming[
>out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];]
>
>AbsoluteTiming[out1Import = Import["out1.txt", "Table"];]
>
>{0.1718750, Null}
>
>{2.4375000, Null}
Yes your example shows a 14x improvement in speed for ReadList
over Import. But note the absolute difference is only a bit more
than 2 seconds. Unless you are going to read numerous files with
the same format, it clearly costs you far more time to get
ReadList to do what you want than is saved. And for file sizes
on the order of 50,000 records, the post processing I am doing
to make things work combined with the time ReadList takes to
read the file, likely is more than the time Import would have
taken in the first place.
BTW, if you really are working with many large files where the
data originates in Mathematica, consider using Put to write the
data out as a Mathematica expression and reading it back with
Get. These will usually be faster than ReadList and take much
less thought to use. The disadvantage of this approach is the
file created by Put will require a lot of work to use outside of Mathematic=
a.
--
To reply via email subtract one hundred and four