[Date Index]
[Thread Index]
[Author Index]
Re: Obtaining Random LIne from A file
*To*: mathgroup at smc.vnet.net
*Subject*: [mg129872] Re: Obtaining Random LIne from A file
*From*: David Bailey <dave at removedbailey.co.uk>
*Date*: Tue, 19 Feb 2013 18:54:14 -0500 (EST)
*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com
*Delivered-to*: l-mathgroup@wolfram.com
*Delivered-to*: mathgroup-newout@smc.vnet.net
*Delivered-to*: mathgroup-newsend@smc.vnet.net
*References*: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kfv4vg$3g3$1@smc.vnet.net>
On 19/02/2013 06:09, Ramiro wrote:
> Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length.
>
> Thanks,
> Ramiro
>
OK - let's establish two points:
1) Are the records in the files of a fixed length?
2) When you say you want an 'arbitrary line' I am assuming that you
calculate a number N, and when want the N'th line of the file. If you
really don't care which line you choose, use Ramiro's method (above).
If your files are not guaranteed to have equal length records, there is
obviously a problem, as I explained before, because you have to read all
N-1 lines to establish which is the N'th. One option therefore, might be
to pre-process your files to make fixed length records by padding with
blanks.
Once you have fixed record length files, you can open them with
BinaryFormat->True and use SetStreamPosition to set the stream to the
position in bytes where your record starts, and read the relevant number
of bytes. Unless you are using extended characters, you could convert
these to characters with FromCharacterCode.
This should be VERY fast, because the cost of each access is not
proportional to the size of the file (once all the files have been
preprocessed).
If the records are variable length but contain some identification such
as a line number, another option would be to pull out a line as Ramiro
suggested, but then use a binary chop procedure to zero in on the line
of interest.
Hint: You may want to look at the processed file with a hex editor, to
make sure the record length is as you expect - remember Windows uses 2
characters per end of line!
David Bailey
http://www.dbaileyconsultancy.co.uk
Prev by Date:
**Series Expansions in Mathematica**
Next by Date:
**Re: Obtaining Random LIne from A file**
Previous by thread:
**Re: Obtaining Random LIne from A file**
Next by thread:
**Re: Obtaining Random LIne from A file**
| |