MathGroup Archive: January 2011 [00437]

[Date Index] [Thread Index] [Author Index]

Re: Parallelize & Functions That Remember Values They Have Found

To: mathgroup at smc.vnet.net
Subject: [mg115551] Re: Parallelize & Functions That Remember Values They Have Found
From: Guido Walter Pettinari <coccoinomane at gmail.com>
Date: Fri, 14 Jan 2011 06:18:56 -0500 (EST)
References: <igmd11$d1$1@smc.vnet.net>

Thank you very much for both answers!

I took inspiration from your example, Thomas, and I found a solution
to my original issue.  I just need to add to the original code I
posted (that is

f[n_] := f[n] = Prime[n]
DistributeDefinitions[f];
result = ParallelTable[f[n], {n, 500000}] // AbsoluteTiming;
elapsed = result[[1]]

) the following lines:

DownValues[f] = Flatten[ParallelEvaluate[DownValues[f]]];
SetSharedFunction[f]

The first line collects the data associated with the function f from
the parallel kernel into the main one, so that whenever I call f[n]
from the main kernel, it does not need to be recomputed.

The second line sets f as a shared function, thus allowing the
parallel kernels to access all the data I just stored into the main
kernel.  In this way I do not need to use up memory on the parallel
kernels, since everything is stored in the main kernel.  The drawback
is that when the parallel kernels are asked to re-compute f[n] (for
example re-executing the first 4 lines of code), data needs to be
transfered from the main kernel.
This is a major issue in our example, since transferring half-million
numbers from the main to the parallel kernels takes longer than re-
computing everything.  However, it's not an issue for me, since I need
to store interpolation tables (i.e. solutions from NDSolve), which are
usually very short.

As a side note, if one is interested only in accessing the data in the
main kernel, the SetSharedFunction[f] command can be avoided.

Please let me know if you find a better solution, and thank you again
for the replies.

Ciao,

Guido




On Jan 13, 8:27 am, thomas <thomas.mue... at gmail.com> wrote:
> Dear Guido,
>
> I have faced a similar problem recently. As a way around this, I collecte=
d the definitions known to the remote kernels in the following way:
>
> f[n_] := f[n] = Prime[n]
> DistributeDefinitions[f];
> ParallelEvaluate[f[n], {n, 500000}];(*now all f's are known remotely*)
> DownValues[f]=Flatten[ParallelEvaluate[DownValues[f]]];(*now all f's ar=
e known centrally*)
> result = Table[f[n], {n, 500000}];
>
> This collection of data can take quite some time and eat up the advantage=
s you gain by parallelization. So it is only worth doing this if your real =
code gains enough speed by parallel evaluation. The best is to experiment w=
ith that!
>
> Even though it works, it seems quite cumbersome to me. I feel that there =
should be a better way.
>
> thomas
>
>
>
>
>
>
>
> On Wednesday, January 12, 2011 10:08:44 AM UTC+1, Guido Walter Pettinari =
wrote:
> > Dear group,
>
> > I am starting to discover the magic behind Parallelize and
> > ParallelTable, but I still have got many problems.  The latest one
> > occurred when I tried to parallelize a function that is supposed to
> > store his values, i.e. those defined as f[x_] := f[x] = .....
>
> > You can reproduce my problem by running the following snippet twice:
>
> > f[n_] := f[n] = Prime[n]
> > DistributeDefinitions[f];
> > result = ParallelTable[f[n], {n, 500000}] // AbsoluteTiming;
> > elapsed = result[[1]]
>
> > On my machine, the first execution takes 2 seconds.  Since I defined =
f
> > as f[x_]:=f[x], I expect the second execution to take much less than
> > that, but it actually takes around 1.8s.  The third one takes
> > something less than that (say 1.4s), and so on.  After many
> > executions, the execution time stabilizes to 0.6 seconds.
>
> > Incidentally, 0.6 seconds is the time that a normal Table takes (on
> > the second execution) to run the same code:
>
> > Exit[]
> > f[n_] := f[n] = Prime[n]
> > result = Table[f[n], {n, 500000}] // AbsoluteTiming;
> > elapsed = result[[1]]
>
> > It looks like my 4 kernels are storing the downvalues of f[x]
> > separately, so that each of them stores only a (random) quarter of the
> > f-values every time the code is run.  When all of them have all of th=
e
> > 500.000 f-values, which happens after many executions, the execution
> > time finally reaches 0.6s.
>
> > Is there a way to make all the f-values stored by the 4 kernels
> > available?  Maybe a function that "collapses" all the information
> > gathered by the kernels into the main kernel, i.e. a
> > DeDistributeDefinitions function?  Or maybe a way to access the memor=
y
> > of all 4 kernels?  I tried to SetSharedFunction on f[x], but it just
> > made the calculation extremely long.
>
> > I will be grateful for any suggestion.
>
> > Thank you for your attention,
>
> > Guido W. Pettinari

Prev by Date: Re: question on diophantine equations in Mathematica

Next by Date: Re: question on diophantine equations in Mathematica

Previous by thread: Re: Parallelize & Functions That Remember Values They Have Found

Next by thread: Spacings option fails in Grid without Frame