Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2013

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Obtaining Random LIne from A file

  • To: mathgroup at smc.vnet.net
  • Subject: [mg129869] Re: Obtaining Random LIne from A file
  • From: awnl <awnl at gmx-topmail.de>
  • Date: Tue, 19 Feb 2013 18:53:14 -0500 (EST)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • Delivered-to: l-mathgroup@wolfram.com
  • Delivered-to: mathgroup-newout@smc.vnet.net
  • Delivered-to: mathgroup-newsend@smc.vnet.net
  • References: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kfv4vg$3g3$1@smc.vnet.net>

Hi,

> Thank you so much for the reply.  My files are 50MB each, I don't
> think ReadList would work for my purposes, it would be too slow.  I
> am actually doing an MCMC simulation, doing (hopefully if I have
> time) millions of iterations and in each one I need to read a random
> line from one of many files, thus requiring this reading to happen as
> quickly as possible. Any suggestions? Each line is pretty much the
> same length.

For that specific use case I see two possible ways to proceed:

1) pick a random position in the file, search line break before that 
position and linebreak after that position, read that line. The 
drawbacks of that approach is that the probability to pick a line is 
proportional to its length and that you have to search line start and 
end for each line you read. If all lines have the same length there is 
no problem, if they are different it depends on whether your MC 
simulation will suffer from a non-uniform distribution or not...

2) build an index of line-starts. Instead of reading the full content 
you could just scan through the file once, searching the positions where 
new lines start and build a list of file position with length equal to 
the number of lines. Then you'd have to choose one of these position 
(e.g. with RandomChoice), seek that position and read one line. I guess 
that this is probably the best alternative. To speed up the building of 
the index you might want to read the file in chunks of several lines 
instead of line by line. You could have a look at e.g. this 
mathematica.stackexchange answer 
http://mathematica.stackexchange.com/a/15216/169
for an example on how to do that.

Whichever way you choose, you will need the low level functions for file 
access: search for OpenRead, StreamPosition, SetStreamPosition, Read, 
ReadList and Close in the documentation for details about these...

hth,

albert






  • Prev by Date: Re: Ingolf Dahl's "SetFaceAndFont" palette broken in Mathematica 9
  • Next by Date: Re: Ingolf Dahl's "SetFaceAndFont" palette broken in Mathematica
  • Previous by thread: Re: Obtaining Random LIne from A file
  • Next by thread: Re: Obtaining Random LIne from A file