Counting Matching Patterns in a Large File
- To: mathgroup at smc.vnet.net
- Subject: [mg116103] Counting Matching Patterns in a Large File
- From: "W. Craig Carter" <ccarter at mit.edu>
- Date: Wed, 2 Feb 2011 06:08:20 -0500 (EST)
MathGroup, (* I'm trying to find a more efficient way to check if a file has more than n lines that match a pattern. As a test, one might use a test example file obtained from: *) Export["BigFile.tsv", Map[RandomReal[{0, 1}, {#}] &, RandomInteger[{1, 20}, {10000}]]] (* Right now, I am using: *) n=5 (*for example*) Count[Import["BigFile.tsv", "Table"], {a_?NumberQ, b_?NumberQ, c_?NumberQ}] > n (* But, in many cases, a count of 5 *would* be obtained well before the end-of-file is reached. My target files are *much* larger than 10000 lines... I haven't dealt with Streams very much---I am guessing that is where the answer lies. Many Thanks, Craig *)