Re: Read/Write streams in parallel

*To*: mathgroup at smc.vnet.net*Subject*: [mg109908] Re: Read/Write streams in parallel*From*: David Bailey <dave at removedbailey.co.uk>*Date*: Sat, 22 May 2010 00:42:07 -0400 (EDT)*References*: <ht1unu$36q$1@smc.vnet.net> <ht5oba$17p$1@smc.vnet.net>

ChrisL wrote: > Thank you for your help, it's been useful. I now keep track of the > StreamPosition explicitly and things work. Here is my code for reading > a file of Mathematica expressions in parallel: > (* First record all the positions *) > in = OpenRead@"/tmp/testfile" > positionList = NestWhileList[#; StreamPosition[in] &, 0, Read[in, > Expression] =!= EndOfFile &]; > Close@in; > positionList = Drop[positionList, -1]; > (* Assign the entries to all processors *) > ParallelEvaluate[in = OpenRead@"/tmp/testfile"]; > DistributeDefinitions[positionList]; > result = ParallelTable[SetStreamPosition[in, sp]; {Read[in, > Expression], $KernelID}, {sp, positionList}]; > ParallelEvaluate[Close@in] > > result contains the values in testfile, as well as the Kernel that > read it. > ListPlot@Map[#[[2]] &, result] > shows how well distributed the load has been across processors. > This could probably be improved (not using the list of positions for > example) but that should be enough right now. > (Note that I needed the positions because I'm not expecting them to be > equally spaced.) > > David: >> I am trying to imagine why you need to do this. (...) > You're correct: I'm going to generate a very large list of Mathematica > expressions that simply won't fit in memory. Each of them will take > some time to process so I'd better parallelize the process and make > use of the local cluster. The final result will actually be the sum of > the output of these processes, so I could simply use ParrallelSum to > compute it. The bit about writing to a file was just for my example. > > Ideally I'd use a generator, like in Python, which would create the > next object to process only when requested, instead of having to > create them all at once and store them in a table, which I can't do > here due to memory limitations. But I didn't see a way to write such a > generator, so I'm using a stream to simulate it. If there's a better, > more Mathematica-y way to do it, I'd love to hear it. > > Thanks again for your help, > Chris > Are you using 64-bit Mathematica (on Windows you get it automatically if you install under 64-bit Windows). Doing this will allow you to use as much memory as you have available in one process, and doing this might solve your speed and memory problems (usually linked, because paging can consume a lot of time). Why can't you write a function GenerateNextObject[] that does just that? You don't need to generate them all in a Table! I'd definitely explore all the serial ways to speed this code up, before going parallel. David Bailey http://www.dbaileyconsultancy.co.uk