Re: Obtaining Random LIne from A file
- To: mathgroup at smc.vnet.net
- Subject: [mg129864] Re: Obtaining Random LIne from A file
- From: "Kevin J. McCann" <kjm at KevinMcCann.com>
- Date: Tue, 19 Feb 2013 18:51:34 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
- References: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kfv4vg$3g3$1@smc.vnet.net>
If you plan to do this millions of times, then your only hope is to load the file(s) into memory, e.g. with ReadList. If you do a disk access for each line, you will be waiting for quite a while. Memory is cheap. Kevin On 2/19/2013 1:09 AM, Ramiro wrote: > Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length. > > Thanks, > Ramiro > > On Sunday, February 17, 2013 4:08:27 AM UTC-5, David Bailey wrote: >> On 16/02/2013 06:07, Ramiro Barrantes wrote: >> >>> Hello, >> >>> >> >>> I would like to get a random line from a file, I know this can be done >> >>> with Mathematica but I am playing with using sed to see if it goes >> >>> faster, say I want to get line 1000 >> >>> >> >>> In mathematica it would be: >> >>> >> >>> <<"! sed -n p1000 filename.txt" >> >>> >> >>> However, I am trying to put the filename as a variable, say >> >>> >> >>> filename="hugefile.txt" >> >>> >> >>> cmd="! sed -n p1000 "<>filename >> >>> <<cmd >> >>> >> >>> does not work. >> >>> >> >>> How can I do this? >> >>> >> >>> Lastly, I am getting a randomline using mathematica doing: >> >>> >> >>> getRandomLine[file_, n_] := >> >>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res}, >> >>> Skip[str, "String", i]; >> >>> res = Read[str, Expression]; >> >>> Close[str]; >> >>> res[[2]] >> >>> ] >> >>> >> >>> However, it is very slow so I was going to try with sed.Any suggestions? >> >>> >> >>> Thanks in advance, >> >>> Ramiro >> >>> >> >>> >> >> I would stick with Mathematica to do this job! How big is the file >> >> (number of lines and number of bytes)? If it will fit inside Mathematica >> >> comfortable, I'd see how it works to read it all in as a list of strings >> >> and pick the one you want: >> >> >> >> xx=ReadList["C:\\some file",String];//Timing >> >> >> >> Then you have an array of strings, and you can select what you want >> >> directly. >> >> >> >> Remember, the basic problem with reading at an arbitrary position in a >> >> text file, is that if the line lengths are not the same, any algorithm >> >> has to read every line before the one you want! If you create this file, >> >> you should consider packing the lines to make them all the same length - >> >> then you could access what you want very efficiently (but with a little >> >> more coding!) >> >> >> >> David Bailey >> >> http://www.dbaileyconsultancy.co.uk > >