MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Read/Write streams in parallel

  • To: mathgroup at smc.vnet.net
  • Subject: [mg109872] Re: Read/Write streams in parallel
  • From: David Bailey <dave at removedbailey.co.uk>
  • Date: Thu, 20 May 2010 07:24:36 -0400 (EDT)
  • References: <ht1unu$36q$1@smc.vnet.net>

ChrisL wrote:
> For a future project, I'll need to read and write some streams in
> parallel. I've started playing with a simple example and can't make it
> work.
> What I want to do is to read an entry from a stream ('testfile') and
> write its value to another stream ('testfile2'). This is
> straightforward with one processor but in parallel, I get either a
> wrong result or errors messages.
> For example (testfile contains the first 100 prime numbers):
> ParallelEvaluate[sr = OpenRead@FileNameJoin[{$TemporaryDirectory,
> "testfile"}]]
> ParallelEvaluate[wr = OpenAppend@FileNameJoin[{$TemporaryDirectory,
> "testfile2"}]]
> ParallelDo[
> 	x = Read[sr]; Print[{x, $KernelID}];
> 	Write[wr, {x, $KernelID}], {10}];
> ParallelEvaluate[Close[sr]]
> ParallelEvaluate[Close[wr]]
> 
> testfile2 then contains:
> {2, 2}
> {2, 1}
> {3, 2}
> {3, 1}
> {5, 2}
> {5, 1}
> {7, 2}
> {7, 1}
> {11, 2}
> {11, 1}
> This is not satisfactory: the entries are read twice, once per
> processor. Obviously, it's because the two streams are opened by both
> processors in parallel. So we want the streams to be opened only once,
> and their 'pointers' (location of last entry read) to be updated
> concurrently.
> I try this instead:
> sr = OpenRead@FileNameJoin[{$TemporaryDirectory, "testfile"}]
> wr = OpenAppend@FileNameJoin[{$TemporaryDirectory, "testfile2"}]
> SetSharedVariable[sr, wr]
> ParallelDo[x = Read[sr]; Print[{x, $KernelID}];
>   Write[wr, {x, $KernelID}], {4}];
> Close[sr]
> Close[wr]
> 
> This time the two streams should be open only once and sync-ed thanks
> to SetSharedVariable. But testfile2 ends up being empty*, and
> Mathematica sends some error messages:
> InputStream["/tmp/testfile", 33]  	(* Value of sr *)
> OutputStream["/tmp/testfile2", 34]  (* Value of wr *)
> (kernel 2) Read::openx: InputStream[/tmp/testfile,33] is not open. (*
> yes it is! *)
> (kernel 1) Read::openx: InputStream[/tmp/testfile,33] is not open. (*
> yes it is! *)
> etc.
> 
> Is there any way I can make it work? The correct content for testfile2
> should be the list of the 10 first prime numbers, possibly in a
> different order, and the kernelID that wrote the entry.
> 
> * if it starts from an empty file. Notice the OpenAppend (!=OpenWrite)
> Thank you very much in advance,
> Cheers,
> Chris
> 
Sharing the stream is obviously a necessary step here, but it is hardly 
surprising that it is not sufficient. The structure of an InputStream 
object contains an integer, that is presumably a pointer to some opaque 
information inside the kernel, which will not have been set up on the 
other kernel.

I am trying to imagine why you need to do this. One possibility is that 
the input stream in your real application will be a list of 'things to 
do' and each kernel will read another one when it has finished 
processing the one before. In that case, perhaps the quantity of 
information in the file is not that large, and it would make sense to 
read the entire file into a shared list, and use a shared integer to 
sequence down the list in parallel.

As regards writing, since you obviously don't care what order the 
results get written to the output file (I assume!) you could just write 
them to separate files and join them up later. Certainly, the less 
interaction between the parallel kernels, the faster things will go.

To be honest, I'd also ask the question as to whether it is worth using 
Mathematica parallelism at all. Certainly, I would put most effort into 
optimising the task on one kernel first.

David Bailey
http://www.dbaileyconsultancy.co.uk


  • Prev by Date: Re: How to Enable Automatic Recalculation
  • Next by Date: Re: Overloading StringJoin
  • Previous by thread: Re: Read/Write streams in parallel
  • Next by thread: Re: Read/Write streams in parallel