MathGroup Archive: September 2004 [00114]

[Date Index] [Thread Index] [Author Index]

Re: Parallel Toolkit Example

To: mathgroup at smc.vnet.net
Subject: [mg50492] Re: Parallel Toolkit Example
From: "Mark Westwood [EPCC]" <markw at epcc.ed.ac.uk>
Date: Wed, 8 Sep 2004 05:05:14 -0400 (EDT)
Organization: Edinburgh University
References: <200409030735.DAA15550@smc.vnet.net> <41387904.6000904@wolfram.com> <chbleb$rk3$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Geoff,

If I've understood you correctly you are trying to distribute the 
factorisation of a number of large integers over a number of processors. 
  If that is correct, read on, if I've misunderstood do something more 
productive with your time.

Suppose that, on a single-processor computer your program would spend 
80% of its time factoring one of the integers, and 20% factoring the 
others.  Now send the integers to be factored out to the worker 
processors.  The one factorising the problem integer will finish in 
approximately 80% of the time that your single processor did, the other 
processors will finish more quickly, but that doesn't affect the total 
execution time of your program which is only 20% lower than on the 
single-processor machine.

I think Daniel Lichtblau is right.  What perhaps isn't clear is that 
ParallelMap does not parallelise the FactorInteger function, in the 
sense of having multiple processors working simultaneously on the 
factorisation of one integer, it distributes the integers to the 
procesors and each works independently on the set it is assigned.  You 
may actually have been lucky to get a 20% speedup and not to have had it 
consumed by the overhead of distributing the computation and gathering 
the results.

You could run some tests to see how long each of the integers in your 
set takes to factorise.  The distribution of times is most unlikely to 
be even approximately uniform.  The total execution time is a little bit 
more than the execution time for factoring the worst case.

As to the question of examples of where the Parallel Toolkit is really 
useful, I don't have any, since I don't have the toolkit.  But I would 
look for problems where:
a) the individual function evaluations should all take approximately the 
same time, which avoids the overall execution time being dominated by 
'pathological' cases;
b) the individual execution times are likely to be long enough to 
amortise the distribution overheads; you'll have a better sense of what 
those overheads are on your cluster than I could guess at.

If (a) is not true for your problem but you have some well-founded 
expectation of how the execution time varies with the problem parameters 
you could write your own scheduling algorithm - look under bin-packing 
or the knapsack in the literature - to distribute the computational load 
as you direct.

Now, if the Parallel Toolkit could really parallelise functions I might 
be interested in a copy, 'til then I'll stick to Fortran for high 
performance number crunching.

Regards
Mark Westwood
Edinburgh Parallel Computing Centre

Geoff Hulette wrote:
> On Sep 3, 2004, at 10:00 AM, Daniel Lichtblau wrote:
> 
> 
>>Geoff Hulette wrote:
>>
>>>Hi,
>>>I am looking for an efficient example of using the Parallel Toolkit.  
>>>Many other people on this forum seem to have noted what I have also 
>>>found to be the case, that the ParallelMap and ParallelTable commands 
>>>aren't terribly efficient.  The example code in ParEval.nb, for 
>>>instance
>>>ParallelMap[FactorInteger, (10^Range[20, 40] )/9]
>>>only seems to run about 20% faster on a six-node cluster vs my 
>>>laptop.  Does anyone have an example of how to solve this same 
>>>problem using an efficient parallel technique?  I am fairly new to 
>>>Mathematica, so please be explicit.
>>>Thanks,
>>>-geoff
>>>Geoff Hulette
>>>MIT Academic Computing
>>>ghulette at mit.edu
>>
>>Not that I know anything about it, but you are probably looking at a 
>>latency issue with this example. Generally speaking for ParallelMap et 
>>al to be of use the time to process the individual elements needs to 
>>be large compared to the overhead time of shipping back and forth over 
>>processors. From what I can tell this is not the case for your example 
>>(they all factor quite quickly).
>>
>>If they did not you could run into a different problem. For some sets 
>>of integers the overall factorization speed will be dominated by the 
>>time needed to factor one particular element. In such cases again a 
>>parallelized version would not be of much help. But for sets where two 
>>or more of the inputs are slow to factor it would of course give a 
>>considerable boost.
>>
>>
>>Daniel Lichtblau
>>Wolfram Research
>>
>>
> 
> Well, that is certainly true.  But, if I am not mistaken (and I could 
> be, since I am new to Mathmatica), the problem I am trying out factors 
> a large range of integers, and the computation for any one element in 
> the set to be factored can be computed independently of the others.  
> So, I suspect that the ParallelMap command is inefficient in that it 
> first computes the range (20 integers, in this case), and sends each 
> one (20 problems total) out over the network.  A far more efficient 
> solution would be break the problem up over the number of processors 
> available, and let each node handle its part of the range.  In this 
> case each processor would only have to handle one (somewhat more 
> difficult) problem, reducing the latency.  This would also allow the 
> process to scale better, as the range increases, since each processor 
> would always only be tackling one problem (with only one associated 
> overhead cost), while the time spent in computation would increase 
> according to the complexity of the problem.  Does anybody have a way to 
> do this?  Or, as a matter of fact, does anyone have any examples at all 
> of efficient Parallel Toolkit routines?
> 
> Thanks,
> geoff
>

References:
- Parallel Toolkit Example
  - From: Geoff Hulette <ghulette@collidescope.net>

Prev by Date: Re: Re: mathlink newbie q

Next by Date: Shapley value and power indices

Previous by thread: Re: Parallel Toolkit Example

Next by thread: Re: Parallel Toolkit Example