Re: random line in a very large file
- To: mathgroup at smc.vnet.net
- Subject: [mg117013] Re: random line in a very large file
- From: Peter Pein <petsie at dordos.net>
- Date: Mon, 7 Mar 2011 05:50:08 -0500 (EST)
- References: <ikvoit$f62$1@smc.vnet.net>
Am 06.03.2011 11:44, schrieb Ramiro: > Hi everyone, > > I have very large files (close to 1gb). I want to find a random line > on it, I wanted to compare the Mathematica native commands, versus > calling a unix command such as sed. For example: > > file = "example"; > n = 1000000; > Export[file, Range[n], "List"]; > i = RandomInteger[{1, n}]; > > str = OpenRead[file]; > Skip[str, "String", i]; > sample1 = Read[str, Expression]; > Print[sample1]; > Close[str]; > > QUESTION: > 1) is this the most efficient way to do it in Mathematica, it's taking > very long for my purposes on my files (note, the sample file above is > small in comparison with the real data) > 2) how can I call a command such as: > > sed '52q;d' example > > I am trying to do: > > cmd= "!sed '52q;d' example" > > and then > > << cmd > > but it's not working. I am not sure how to use<< in such a way, nor > how to mix characters such as ' and ". > > Any suggestions? > > Thanks in advance, any help appreciated, > Ramiro > p.s. by the way, the sed command above _seems_ to be faster on my > files from the command line, that is why I have the option of using > them. > What about ReadList? In: SetDirectory["C:\\Program Files (x86)\\Arena\\Engines\\GNU-Chess\\"]; ReadList["!sed 64q;d gnuchess-ChangeLog.htm",String] Out: { was not worth the loss of portability.} In: %//FullForm Out//FullForm: List["\twas not worth the loss of portability."] hth, Peter