Author 
Comment/Response 
yehuda

11/11/12 03:48am
Parallelization is not always faster
there are cost of communication to move results forward and backward to and from the slave kernels
When the cost is high compared with the computation (as in you case) serial computation is preferable
In addition, why are you using Do loops in the first place?
if you use the Listable property of Plus you need just to write
c=a+b; no wasting time on generating n x n matrix of random numbers just to occupy memory because you are using loops
AbsoluteTiming[n = 5000;
a = RandomReal[{}, {n, n}];
b = RandomReal[{}, {n, n}];
c = a + b;]
This simple serial code is faster than any parallelization
Just to demonstrate when parallelization is better
take a recursive function (say computing elements of Fibonacci series). The definition uses two recursions
I set both to 25 (relatively long)
I have on my system 8 hyper threads (quad core)
the serial implementation is now
fib[i_Integer] := fib[i  1] + fib[i  2]
fib[0] = fib[1] = 1;
n = 8;
a = b = c = ConstantArray[25, {n, n}];
AbsoluteTiming[
Do[c[[i, j]] = fib[a[[i, j]]] + fib[b[[i, j]]], {i, n}, {j, n}];]
which takes almost 24 seconds
now take the simplest parallelization INCLUDING the time to upload the kernels, and it takes about a third of the serial (all 8 hyper threads are working)
fib[i_Integer] := fib[i  1] + fib[i  2]
fib[0] = fib[1] = 1;
n = 8;
a = b = c = ConstantArray[25, {n, n}];
AbsoluteTiming[
ParallelDo[
c[[i, j]] = fib[a[[i, j]]] + fib[b[[i, j]]], {i, n}, {j, n}];]
Quikt[]
The longer the computational part, the better parallelization performance
Of course of IO is involved, things are different
HTH
yehuda
URL: , 
