Re: Obtaining Random LIne from A file

*To*: mathgroup at smc.vnet.net*Subject*: [mg129836] Re: Obtaining Random LIne from A file*From*: awnl <awnl at gmx-topmail.de>*Date*: Mon, 18 Feb 2013 05:59:22 -0500 (EST)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*Delivered-to*: l-mathgroup@wolfram.com*Delivered-to*: mathgroup-newout@smc.vnet.net*Delivered-to*: mathgroup-newsend@smc.vnet.net*References*: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net>

Am 17.02.2013 10:08, schrieb David Bailey: > On 16/02/2013 06:07, Ramiro Barrantes wrote: >> Hello, >> >> I would like to get a random line from a file, I know this can be done >> with Mathematica but I am playing with using sed to see if it goes >> faster, say I want to get line 1000 >> >> In mathematica it would be: >> >> <<"! sed -n p1000 filename.txt" >> >> However, I am trying to put the filename as a variable, say >> >> filename="hugefile.txt" >> >> cmd="! sed -n p1000 "<>filename >> <<cmd >> >> does not work. >> >> How can I do this? >> >> Lastly, I am getting a randomline using mathematica doing: >> >> getRandomLine[file_, n_] := >> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res}, >> Skip[str, "String", i]; >> res = Read[str, Expression]; >> Close[str]; >> res[[2]] >> ] >> >> However, it is very slow so I was going to try with sed.Any suggestions? >> >> Thanks in advance, >> Ramiro >> >> > I would stick with Mathematica to do this job! How big is the file > (number of lines and number of bytes)? If it will fit inside Mathematica > comfortable, I'd see how it works to read it all in as a list of strings > and pick the one you want: > > xx=ReadList["C:\\some file",String];//Timing > > Then you have an array of strings, and you can select what you want > directly. > > Remember, the basic problem with reading at an arbitrary position in a > text file, is that if the line lengths are not the same, any algorithm > has to read every line before the one you want! if he just wants to get an arbitrary line that's not true: just choosing a position in the file at random and searching e.g. the previous and next linebreak would also result in picking a random line. Of course the probability of choosing longer lines would be larger than that for shorter lines, but it isn't clear from the question whether that would be a problem for what the OP tries to do... > If you create this file, > you should consider packing the lines to make them all the same length - > then you could access what you want very efficiently (but with a little > more coding!) ... and slightly (?) higher memory requirements... hth, albert