Re: Using ReadList to read a string
- To: mathgroup at smc.vnet.net
- Subject: [mg83823] Re: Using ReadList to read a string
- From: "Steve Luttrell" <steve at _removemefirst_luttrell.org.uk>
- Date: Sat, 1 Dec 2007 05:49:13 -0500 (EST)
- References: <fioqg6$blb$1@smc.vnet.net>
When I have difficulty with getting ReadList to do what I want I then revert to reading in the file thus data = ReadList[<file>, String]; so that each record is read as a single string. Then I use string processing to extract what I need. Here you will find that StringCases does everything you want, using the details you will find documented in the "More Information" section on StringExpression to construct the string pattern that does the required job. So you would use StringCases[data, string pattern built using StringExpression] or one of its variants. Steve Luttrell West malvern, UK "Donald DuBois" <donabc at comcast.net> wrote in message news:fioqg6$blb$1 at smc.vnet.net... > Hello, > > I am trying to get ReadList to read a string in a text file > (filename.txt). > > I would like NOT to have use Import because it is MUCH slower in reading > a text file than ReadList is. For example: > > (1) a file with 50,000 records can be created > (2) Exported to disk > (3) read by ReadList[...] and > (4) read by Import[...] > > > dataFile1 = > Table[{2001, "nameA", "symbolA", > 15.5}, {50000}]; Export["out1.txt", dataFile1, "Table"]; > > AbsoluteTiming[ > out1ReadList = ReadList["out1.txt", {Number, Word, Word, Number}];] > > AbsoluteTiming[out1Import = Import["out1.txt", "Table"];] > > {0.1718750, Null} > > {2.4375000, Null} > > Import takes 14 times longer to read in the same file as compared to > ReadList. > So, naturally, I would like to use ReadList whenever I have a .txt file to > be read in from disk. > > However, the file to be read is slightly more complicated than the one > above (out1.txt). > There is a string that is added to the file as the second element of a > record. > The first few records of the file (EWZ2.TXT below) look like the > following with each record > consisting of eight elements: a number, string, word followed by five > integer numbers for each record. > Each record is on a separate line. > > EWZ2.TXT: > > 20000714 "iShares MSCI Brazil Index" EWZ 250 1627 1637 > 1627 1637 > 20000717 "iShares MSCI Brazil Index" EWZ 100 1730 1735 > 1730 1735 > 20000718 "iShares MSCI Brazil Index" EWZ 100 1730 1730 > 1730 1730 > 20000719 "iShares MSCI Brazil Index" EWZ 100 1686 1686 > 1686 1686 > 20000720 "iShares MSCI Brazil Index" EWZ 50 1724 1724 > 1724 1724 > > The format of the above file is: {Number, String, Word, Number, Number, > Number, Number, Number} > > If this file on disk is named "EWZ2.TXT" I am not able to use ReadList to > read it. > I use two format specifications within ReadList > and neither of them works: > > {Number, String, Word, Number, Number, Number, Number, Number} > and {Number, Word, Word, Number, Number, Number, Number, Number}. > > ReadList["EWZ2.TXT", {Number, String, Word, Number, Number, Number, > Number, Number}] > ReadList["EWZ2.TXT", {Number, Word, Word, Number, Number, Number, > Number, Number}] > > > > {{20000714, > " \"iShares MSCI Brazil Index\" EWZ 250 1627 \ > 1637 1627 1637", "20000717", $Failed, EndOfFile, > EndOfFile, EndOfFile, EndOfFile}} > > {{20000714, "\"iShares", "MSCI", $Failed, EndOfFile, EndOfFile, > EndOfFile, EndOfFile}} > > > Using "String" for the format of the second element seems to have more > success than "Word" but, when > read, none of the elements is separated by a comma as happened when using > ReadList to read > out1.txt above. > > "iShares MSCI Brazil Index" should be the second element of a sublist > within > the entire list (Table) and EWZ (with or without quotes) should be the > third element > within a sublist. > > The defintion of a String in the function description for ReadList is > "string terminated by a newline" which does not describe the above file. > (EWZ2.TXT). If the string is moved in the file so that it is the last > item > in any record, such as > > 20000714 EWZ 250 1627 1637 1627 1637 > "iShares MSCI Brazil Index" > > then a format of {Number, Word, Number, Number, Number, Number, Number, > String} > in ReadList DOES work to read the file correclty. > > But, is there anyway to get ReadList to read the above file (EWZ2.TXT) > with the string > as the second item of a record so that the speed advantage of ReadList > over Import > can be retained? > > Or is there some other function I should be using other than Import and/or > ReadList? > > Thank you in advance for any help you can give me. > Don >