Re: Using ReadList to read a string
- To: mathgroup at smc.vnet.net
- Subject: [mg83905] Re: [mg83781] Using ReadList to read a string
- From: "Igor C. Antonio" <igora at wolf-ram.com>
- Date: Tue, 4 Dec 2007 04:22:14 -0500 (EST)
- Organization: Wolfram Research, Inc.
- References: <200711301023.FAA06237@smc.vnet.net>
- Reply-to: igora at wolf-ram.com
Donald DuBois wrote: > Hello, > > I am trying to get ReadList to read a string in a text file (filename.txt). > > I would like NOT to have use Import because it is MUCH slower in reading > a text file than ReadList is. For example: > > (1) a file with 50,000 records can be created > (2) Exported to disk > (3) read by ReadList[...] and > (4) read by Import[...] > > > dataFile1 = > Table[{2001, "nameA", "symbolA", > 15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"]; > > AbsoluteTiming[ > out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];] > > AbsoluteTiming[out1Import = Import["out1.txt", "Table"];] > > {0.1718750, Null} > > {2.4375000, Null} > > Import takes 14 times longer to read in the same file as compared to ReadList. > So, naturally, I would like to use ReadList whenever I have a .txt file to be read in from disk. > > However, the file to be read is slightly more complicated than the one above (out1.txt). > There is a string that is added to the file as the second element of a record. > The first few records of the file (EWZ2.TXT below) look like the following with each record > consisting of eight elements: a number, string, word followed by five integer numbers for each record. > Each record is on a separate line. > > EWZ2.TXT: > > 20000714 "iShares MSCI Brazil Index" EWZ 250 1627 1637 1627 1637 > 20000717 "iShares MSCI Brazil Index" EWZ 100 1730 1735 1730 1735 > 20000718 "iShares MSCI Brazil Index" EWZ 100 1730 1730 1730 1730 > 20000719 "iShares MSCI Brazil Index" EWZ 100 1686 1686 1686 1686 > 20000720 "iShares MSCI Brazil Index" EWZ 50 1724 1724 1724 1724 > Import is naturally slower than ReadList as it processes the data for you. It parses dates, strings, numbers, currency, etc to their mathematica equivalent. With that said, we have made speed improvements in Import as Table and they will be in the next minor update to Mathematica. ------------------- In[10]:= dataFile1 = Table[{2001, "nameA", "symbolA", 15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"]; In[11]:= a1 = AbsoluteTiming[ out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];] Out[11]= {0.2031263, Null} In[12]:= a2 = AbsoluteTiming[out1Import = Import["out1.txt", "Table"];] Out[12]= {0.4843781, Null} In[13]:= a2[[1]]/a1[[1]] Out[13]= 2.384615 ------------- *Disclaimer: speed of Import as Table largely depends on the amount of processing Table has to do on the data. The more numbers, dates, currencies they file has, the slower it will be. You may want to try Import[<file>, "Table", "Numeric"->False], which disables the parsing of the data while steal splitting the data correctly and handling quotes: In[70]:= data =Import["donald_short.txt", "Table","Numeric"->False]//InputForm Out[70]//InputForm= {{"20000714", "iShares MSCI Brazil Index", "EWZ", "250", "1627", "1637", "1627", "1637"}, {"20000717", "iShares MSCI Brazil Index", "EWZ", "100", "1730", "1735", "1730", "1735"}, {"20000718", "iShares MSCI Brazil Index", "EWZ", "100", "1730", "1730", "1730", "1730"}, {"20000719", "iShares MSCI Brazil Index", "EWZ", "100", "1686", "1686", "1686", "1686"}, {"20000720", "iShares MSCI Brazil Index", "EWZ", "50", "1724", "1724", "1724", "1724"}} You could then post-process the data on your own. > But, is there anyway to get ReadList to read the above file (EWZ2.TXT) with the string > as the second item of a record so that the speed advantage of ReadList over Import > can be retained? Most likely you won't be able to import that data correctly in one pass with a single ReadList call. ReadList is a lower-level function than Import and doesn't have the ability to parse the quotes out of strings (not Mathematica String). Import as TSV, CSV, and Table process the data as it imports in order to handle that case (and many others). > Don -- Igor C. Antonio Software Engineer Wolfram Research, Inc. http://www.wolfram.com To email me personally, remove the dash.