Re: random line in a very large file
- To: mathgroup at smc.vnet.net
- Subject: [mg117013] Re: random line in a very large file
- From: Peter Pein <petsie at dordos.net>
- Date: Mon, 7 Mar 2011 05:50:08 -0500 (EST)
- References: <ikvoit$f62$1@smc.vnet.net>
Am 06.03.2011 11:44, schrieb Ramiro:
> Hi everyone,
>
> I have very large files (close to 1gb). I want to find a random line
> on it, I wanted to compare the Mathematica native commands, versus
> calling a unix command such as sed. For example:
>
> file = "example";
> n = 1000000;
> Export[file, Range[n], "List"];
> i = RandomInteger[{1, n}];
>
> str = OpenRead[file];
> Skip[str, "String", i];
> sample1 = Read[str, Expression];
> Print[sample1];
> Close[str];
>
> QUESTION:
> 1) is this the most efficient way to do it in Mathematica, it's taking
> very long for my purposes on my files (note, the sample file above is
> small in comparison with the real data)
> 2) how can I call a command such as:
>
> sed '52q;d' example
>
> I am trying to do:
>
> cmd= "!sed '52q;d' example"
>
> and then
>
> << cmd
>
> but it's not working. I am not sure how to use<< in such a way, nor
> how to mix characters such as ' and ".
>
> Any suggestions?
>
> Thanks in advance, any help appreciated,
> Ramiro
> p.s. by the way, the sed command above _seems_ to be faster on my
> files from the command line, that is why I have the option of using
> them.
>
What about ReadList?
In: SetDirectory["C:\\Program Files (x86)\\Arena\\Engines\\GNU-Chess\\"];
ReadList["!sed 64q;d gnuchess-ChangeLog.htm",String]
Out: { was not worth the loss of portability.}
In: %//FullForm
Out//FullForm: List["\twas not worth the loss of portability."]
hth,
Peter