Re: Obtaining Random LIne from A file
- To: mathgroup at smc.vnet.net
- Subject: [mg129860] Re: Obtaining Random LIne from A file
- From: David Bailey <dave at removedbailey.co.uk>
- Date: Tue, 19 Feb 2013 00:59:46 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
- References: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kft1ij$cov$1@smc.vnet.net>
On 18/02/2013 10:59, awnl wrote: > Am 17.02.2013 10:08, schrieb David Bailey: >> On 16/02/2013 06:07, Ramiro Barrantes wrote: >>> Hello, >>> >>> I would like to get a random line from a file, I know this can be done >>> with Mathematica but I am playing with using sed to see if it goes >>> faster, say I want to get line 1000 >>> >>> In mathematica it would be: >>> >>> <<"! sed -n p1000 filename.txt" >>> >>> However, I am trying to put the filename as a variable, say >>> >>> filename="hugefile.txt" >>> >>> cmd="! sed -n p1000 "<>filename >>> <<cmd >>> >>> does not work. >>> >>> How can I do this? >>> >>> Lastly, I am getting a randomline using mathematica doing: >>> >>> getRandomLine[file_, n_] := >>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res}, >>> Skip[str, "String", i]; >>> res = Read[str, Expression]; >>> Close[str]; >>> res[[2]] >>> ] >>> >>> However, it is very slow so I was going to try with sed.Any suggestions? >>> >>> Thanks in advance, >>> Ramiro >>> >>> >> I would stick with Mathematica to do this job! How big is the file >> (number of lines and number of bytes)? If it will fit inside Mathematica >> comfortable, I'd see how it works to read it all in as a list of strings >> and pick the one you want: >> >> xx=ReadList["C:\\some file",String];//Timing >> >> Then you have an array of strings, and you can select what you want >> directly. >> >> Remember, the basic problem with reading at an arbitrary position in a >> text file, is that if the line lengths are not the same, any algorithm >> has to read every line before the one you want! > > if he just wants to get an arbitrary line that's not true: just choosing > a position in the file at random and searching e.g. the previous and > next linebreak would also result in picking a random line. Of course the > probability of choosing longer lines would be larger than that for > shorter lines, but it isn't clear from the question whether that would > be a problem for what the OP tries to do... > >> If you create this file, >> you should consider packing the lines to make them all the same length - >> then you could access what you want very efficiently (but with a little >> more coding!) > > ... and slightly (?) higher memory requirements... > > hth, > > albert > Well yes - but I assume that by 'arbitrary' he means a specific record somewhere in the file! David Bailey http://www.dbaileyconsultancy.co.uk