MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: random line in a very large file

  • To: mathgroup at smc.vnet.net
  • Subject: [mg117013] Re: random line in a very large file
  • From: Peter Pein <petsie at dordos.net>
  • Date: Mon, 7 Mar 2011 05:50:08 -0500 (EST)
  • References: <ikvoit$f62$1@smc.vnet.net>

Am 06.03.2011 11:44, schrieb Ramiro:
> Hi everyone,
>
> I have very large files (close to 1gb).  I want to find a random line
> on it, I wanted to compare the Mathematica native commands, versus
> calling a unix command such as sed.  For example:
>
> file = "example";
> n = 1000000;
> Export[file, Range[n], "List"];
> i = RandomInteger[{1, n}];
>
> str = OpenRead[file];
> Skip[str, "String", i];
> sample1 = Read[str, Expression];
> Print[sample1];
> Close[str];
>
> QUESTION:
> 1) is this the most efficient way to do it in Mathematica, it's taking
> very long for my purposes on my files (note, the sample file above is
> small in comparison with the real data)
> 2) how can I call a command such as:
>
> sed '52q;d' example
>
> I am trying to do:
>
> cmd= "!sed '52q;d' example"
>
> and then
>
> <<  cmd
>
> but it's not working. I am not sure how to use<<  in such a way, nor
> how to mix characters such as ' and ".
>
> Any suggestions?
>
> Thanks in advance, any help appreciated,
> Ramiro
> p.s. by the way, the sed command above _seems_ to be faster on my
> files from the command line, that is why I have the option of using
> them.
>

What about ReadList?

In: SetDirectory["C:\\Program Files (x86)\\Arena\\Engines\\GNU-Chess\\"];
   ReadList["!sed 64q;d gnuchess-ChangeLog.htm",String]
Out: {	was not worth the loss of portability.}

In: %//FullForm
Out//FullForm: List["\twas not worth the loss of portability."]

hth,
Peter


  • Prev by Date: CUDALink with Mathematica 8, Visual C++ 2008 Express and Windows 7 x64
  • Next by Date: Re: How to avoid repeated calculation in NDSolve ?
  • Previous by thread: Re: random line in a very large file
  • Next by thread: Re: random line in a very large file