Counting Matching Patterns in a Large File
- To: mathgroup at smc.vnet.net
- Subject: [mg116103] Counting Matching Patterns in a Large File
- From: "W. Craig Carter" <ccarter at mit.edu>
- Date: Wed, 2 Feb 2011 06:08:20 -0500 (EST)
MathGroup,
(*
I'm trying to find a more efficient way to check if a file has more than n lines that match a pattern.
As a test, one might use a test example file obtained from:
*)
Export["BigFile.tsv", Map[RandomReal[{0, 1}, {#}] &, RandomInteger[{1, 20}, {10000}]]]
(*
Right now, I am using:
*)
n=5 (*for example*)
Count[Import["BigFile.tsv", "Table"], {a_?NumberQ, b_?NumberQ, c_?NumberQ}] > n
(*
But, in many cases, a count of 5 *would* be obtained well before the end-of-file is reached.
My target files are *much* larger than 10000 lines...
I haven't dealt with Streams very much---I am guessing that is where the answer lies.
Many Thanks, Craig
*)