Re: Obtaining Random LIne from A file
- To: mathgroup at smc.vnet.net
- Subject: [mg129869] Re: Obtaining Random LIne from A file
- From: awnl <awnl at gmx-topmail.de>
- Date: Tue, 19 Feb 2013 18:53:14 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
- References: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kfv4vg$3g3$1@smc.vnet.net>
Hi, > Thank you so much for the reply. My files are 50MB each, I don't > think ReadList would work for my purposes, it would be too slow. I > am actually doing an MCMC simulation, doing (hopefully if I have > time) millions of iterations and in each one I need to read a random > line from one of many files, thus requiring this reading to happen as > quickly as possible. Any suggestions? Each line is pretty much the > same length. For that specific use case I see two possible ways to proceed: 1) pick a random position in the file, search line break before that position and linebreak after that position, read that line. The drawbacks of that approach is that the probability to pick a line is proportional to its length and that you have to search line start and end for each line you read. If all lines have the same length there is no problem, if they are different it depends on whether your MC simulation will suffer from a non-uniform distribution or not... 2) build an index of line-starts. Instead of reading the full content you could just scan through the file once, searching the positions where new lines start and build a list of file position with length equal to the number of lines. Then you'd have to choose one of these position (e.g. with RandomChoice), seek that position and read one line. I guess that this is probably the best alternative. To speed up the building of the index you might want to read the file in chunks of several lines instead of line by line. You could have a look at e.g. this mathematica.stackexchange answer http://mathematica.stackexchange.com/a/15216/169 for an example on how to do that. Whichever way you choose, you will need the low level functions for file access: search for OpenRead, StreamPosition, SetStreamPosition, Read, ReadList and Close in the documentation for details about these... hth, albert