Re: Counting Matching Patterns in a Large File
- To: mathgroup at smc.vnet.net
- Subject: [mg116157] Re: Counting Matching Patterns in a Large File
- From: "Sjoerd C. de Vries" <sjoerd.c.devries at gmail.com>
- Date: Thu, 3 Feb 2011 05:33:41 -0500 (EST)
- References: <iibdv9$npo$1@smc.vnet.net>
Craig, Something like str = OpenRead["BigFile.tsv"]; count = 0; While[count < 5000 && (rl = ReadList[str, Record, 1]) =!= {}, If[ MatchQ[ToExpression /@ StringSplit[rl[[1]], "\t"], {a_?NumberQ, b_?NumberQ, c_?NumberQ}], count++ ] ] Close[str]; count should do. It's not particularly good looking and I'm not too sure about efficiency. ReadList doesn't seem to be able to read variable numbers of fields per record, so I read each line as a string and split using StringSplit. Cheers -- Sjoerd On Feb 2, 12:08 pm, "W. Craig Carter" <ccar... at mit.edu> wrote: > MathGroup, > > (* > I'm trying to find a more efficient way to check if a file has more than = n lines that match a pattern. > > As a test, one might use a test example file obtained from: > *) > > Export["BigFile.tsv", Map[RandomReal[{0, 1}, {#}] &, RandomInteger[{1= , 20}, {10000}]]] > > (* > Right now, I am using: > *) > > n=5 (*for example*) > > Count[Import["BigFile.tsv", "Table"], {a_?NumberQ, b_?NumberQ, c_?NumberQ= }] > n > > (* > But, in many cases, a count of 5 *would* be obtained well before the end-= of-file is reached. > > My target files are *much* larger than 10000 lines... > > I haven't dealt with Streams very much---I am guessing that is where the = answer lies. > > Many Thanks, Craig > *)