Re: Parallelize & Functions That Remember Values They Have Found
- To: mathgroup at smc.vnet.net
- Subject: [mg115510] Re: Parallelize & Functions That Remember Values They Have Found
- From: Albert Retey <awnl at gmx-topmail.de>
- Date: Thu, 13 Jan 2011 03:26:12 -0500 (EST)
- References: <igjr2s$94j$1@smc.vnet.net>
Hi, > I am starting to discover the magic behind Parallelize and > ParallelTable, but I still have got many problems. The latest one > occurred when I tried to parallelize a function that is supposed to > store his values, i.e. those defined as f[x_] := f[x] = ..... > > You can reproduce my problem by running the following snippet twice: > > f[n_] := f[n] = Prime[n] > DistributeDefinitions[f]; > result = ParallelTable[f[n], {n, 500000}] // AbsoluteTiming; > elapsed = result[[1]] > > On my machine, the first execution takes 2 seconds. Since I defined f > as f[x_]:=f[x], I expect the second execution to take much less than > that, but it actually takes around 1.8s. The third one takes > something less than that (say 1.4s), and so on. After many > executions, the execution time stabilizes to 0.6 seconds. > > Incidentally, 0.6 seconds is the time that a normal Table takes (on > the second execution) to run the same code: > > Exit[] > f[n_] := f[n] = Prime[n] > result = Table[f[n], {n, 500000}] // AbsoluteTiming; > elapsed = result[[1]] > > It looks like my 4 kernels are storing the downvalues of f[x] > separately, so that each of them stores only a (random) quarter of the > f-values every time the code is run. When all of them have all of the > 500.000 f-values, which happens after many executions, the execution > time finally reaches 0.6s. > > Is there a way to make all the f-values stored by the 4 kernels > available? Maybe a function that "collapses" all the information > gathered by the kernels into the main kernel, i.e. a > DeDistributeDefinitions function? Or maybe a way to access the memory > of all 4 kernels? I tried to SetSharedFunction on f[x], but it just > made the calculation extremely long. > > I will be grateful for any suggestion. I think your analysis is quite correct. Whatever you try to avoid every kernel to recalculate what another already has calculated will need synchronization of the kernels. This is what SetSharedFunction does and is the reason why it is so slow (synchronization is always "expensive" and you want to avoid it as much as possible in an efficient parallel algorithm). Maybe what SetSharedFunction does is too general for what you need, but I doubt you can achieve something faster within Mathematica. The memoization that you implement is one of those things that just are not very well suited to be parallelized and you will need a strategy which finds an optimum so it will only do as few calculations as possible more than once but not become slow due to synchronization overhead. I don't think there is an easy trick that would make it fast and the exact best strategy might even depend on the hardware you are on. Only if the evaluation of a single RHS takes relatively long I see that parallelizing with such a strategy even has a chance to be faster than the serial code. hth, albert