Re: Using ReadList to read a string
- To: mathgroup at smc.vnet.net
- Subject: [mg83819] Re: Using ReadList to read a string
- From: Bill Rowe <readnewsciv at sbcglobal.net>
- Date: Sat, 1 Dec 2007 05:47:07 -0500 (EST)
On 11/30/07 at 5:23 AM, donabc at comcast.net (Donald DuBois) wrote: >I am trying to get ReadList to read a string in a text file (filename.txt). >I would like NOT to have use Import because it is MUCH slower in >reading a text file than ReadList is. For example: There is a good reason for Import being slower than ReadList. Import is designed to work with complex data structures and recognize strings from numbers automatically. The extra computation needed to do this is why Import is slower. <snip> >EWZ2.TXT: 20000714 "iShares MSCI Brazil Index" EWZ 250 1627 1637 1627 1637 20000717 "iShares MSCI Brazil Index" EWZ 100 1730 1735 1730 1735 20000718 "iShares MSCI Brazil Index" EWZ 100 1730 1730 1730 1730 20000719 "iShares MSCI Brazil Index" EWZ 100 1686 1686 1686 1686 20000720 "iShares MSCI Brazil Index" EWZ 50 1724 1724 1724 1724 >The format of the above file is: {Number, String, Word, Number, >Number, Number, Number, Number} There are a several ways to approach this problem. One set of approaches is to read the data as strings or records then use Mathematica to convert those to the desired data types: For example, In[19]:= data = StringSplit[#, "\""] & /@ ReadList["test.txt", String]; Flatten[{ToExpression[First@#], #[[2]], StringSplit[#[[3]], Whitespace][[1]], ToExpression /@ Rest[StringSplit[#[[3]], Whitespace]]}] & /@ data Out[20]= (\[NoBreak] 20000714 iShares MSCI Brazil Index EWZ 250 1627 1637 1627 1637 20000717 iShares MSCI Brazil Index EWZ 100 1730 1735 1730 1735 20000718 iShares MSCI Brazil Index EWZ 100 1730 1730 1730 1730 20000719 iShares MSCI Brazil Index EWZ 100 1686 1686 1686 1686 20000720 iShares MSCI Brazil Index EWZ 50 1724 1724 1724 1724 \[NoBreak]) does the trick. Alternatively, data=ReadList["test.txt", {Number, Word, Word, Word, Word, Word, Number, Number, Number, Number, Number}]; =46latten{First@#,StringJoin@@Take[#,{2,5}],Drop[#,6]}&/@data will also work. You might also be able to get ReadList to do everything by with the appropriate TokenWords list and RecordSeparators. But notice what is happening here. The time saved by being able to read the file quickly is being consumed by post processing the data to get it in the form you want. Additionally, there is your time getting things to work and verifying they do work. >dataFile1 = Table[{2001, "nameA", "symbolA", 15.5}, {50000}]; >Export["out1.txt", dataFile1, "Table"]; > >AbsoluteTiming[ >out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];] > >AbsoluteTiming[out1Import = Import["out1.txt", "Table"];] > >{0.1718750, Null} > >{2.4375000, Null} Yes your example shows a 14x improvement in speed for ReadList over Import. But note the absolute difference is only a bit more than 2 seconds. Unless you are going to read numerous files with the same format, it clearly costs you far more time to get ReadList to do what you want than is saved. And for file sizes on the order of 50,000 records, the post processing I am doing to make things work combined with the time ReadList takes to read the file, likely is more than the time Import would have taken in the first place. BTW, if you really are working with many large files where the data originates in Mathematica, consider using Put to write the data out as a Mathematica expression and reading it back with Get. These will usually be faster than ReadList and take much less thought to use. The disadvantage of this approach is the file created by Put will require a lot of work to use outside of Mathematic= a. -- To reply via email subtract one hundred and four