Re: Help with Speeding up a For loop

*To*: mathgroup at smc.vnet.net*Subject*: [mg98920] Re: [mg98881] Help with Speeding up a For loop*From*: Leonid Shifrin <lshifr at gmail.com>*Date*: Wed, 22 Apr 2009 05:08:38 -0400 (EDT)*References*: <200904202313.TAA09729@smc.vnet.net>

Hi Adam, I have a number of comments on your code, which will hopefully clarify the main sources of confusion and inefficiency. Here is the version of your code which I modified somewhat. The <initialize> function is introduced for convenience, to be able to perform these operations many times, with different number of points. Same with <runSimpleLoop>. I also changed the names Ideal -> IdealLoop, Resolution -> ResolutionLoop. In[1] = initialize[num_Integer?Positive] := Module[{}, Eb1 = 0; k = 0; n = num; E0 = 2470; m = 0.2; DeltaE = 50; Sigma = 5; maxE = E0 - m; minE = E0 - DeltaE; (*Setup all constants,lists and ranges*) Eb1 = RandomReal[{minE, maxE}, n];(*create a list of'n' random Eb1 values*) k = -RandomReal[TriangularDistribution[{-2470, 0}, -0.1], n]; ]; In[2] = runSimpleLoop[] := Block[{Eb1 = Eb1, k = k, y3}, ResolutionLoop = {}; IdealLoop = {}; For[i = 1, i <= n, i++, If[k[[i]] < Re[(E0 - Eb1[[i]])^2* Sqrt[1 - m^2/(E0 - Eb1[[i]])^2]],(*check if the {k, Eb1} value is under the curve*) AppendTo[IdealLoop, Eb1[[i]]];(*Keep events under curve in'Ideal'*) y3 = Eb1[[i]];(*cast element to a number*) Eb1[[i]] = First@RandomReal[NormalDistribution[y3, Sigma], 1];(*choose a random number from a normal distribution about \ that point*) AppendTo[ResolutionLoop, Eb1[[i]]];]] ](*Keep that event in'Resolution'*) Comments so far: These reflect changes in the code above compared to your original version: 1. You place in your lists Resolution and Ideal 0 to start with. This is not correct, since this point does not belong there. Start with {} instead. 2. This is not C, array index starts from 1, not 0 3. Try to avoid side effects and stay away from global variables. They make life harder. Block[{Eb1 = Eb1}, loop ] makes Eb1 dynamically local while produces the same end result. (Generally, use Module, not Block - I just wanted to keep the code as much close to yours as possible. 4. RandomReal[..., 1] produces a number wrapped in a list. You don' t need that - take First of that list. These two are not dealt with so far: 5. Redundancy of y3. 6. No need to change Eb1 in place. Now, the reason your code is slow: the main culprit of your implementation is AppendTo. Avoid it in loops at all costs - leads to quadratic in the size of the list complexity. Use Reap - Sow if you insist on using loops at all. Below is the code that uses Reap - Sow: In[3] = runLoopWithReapSow[] := Block[{Eb1 = Eb1, k = k, y3}, Reap[For[i = 1, i <= n, i++, y3 = Eb1[[i]]; If[k[[i]] < Re[(E0 - y3)^2*Sqrt[1 - m^2/(E0 - y3)^2]], Sow[y3, "ideal"]; Sow[First@RandomReal[NormalDistribution[y3, Sigma], 1], "resolution"]; ]], _, List][[2]]]; This loop produces the result - the lists of interest combined together, with the string "tags": "ideal" and "resolution". The names of the labels (tags) are completely arbitrary, I could even use just numbers 0 and 1 for them. Let us look at a simple example: In[4] = initialize[20]; runSimpleLoop[]; In[5] = IdealLoop Out[5] = {2431.1, 2442.78, 2445.85, 2447.74, 2429.49, 2448.45, 2434.79, \ 2425.95}; In[6] = ResolutionLoop Out[6] = {2436.17, 2443.8, 2432.54, 2433.07, 2420.16, 2439.93, 2418.47, \ 2423.48, 2436.35} In[7] = runLoopWithReapSow[] Out[7] = {{"ideal", {2431.1, 2442.78, 2445.85, 2447.74, 2429.49, 2448.45, 2434.79, 2425.95}}, {"resolution", {2433.86, 2443.81, 2446.62, 2450.4, 2426.98, 2442.91, 2436.05, 2436.48}}} We only can compare the <ideal> part - it is of course the same. Now, the timing test: In[8] = initialize[10000]; Print[runSimpleLoop[]; // Timing]; Print[newResult = runLoopWithReapSow[]; // Timing]; {IdealLoopNew, ResolutionLoopNew} = Part[newResult, All, 2]; {8.893,Null} {0.892,Null} Already here we get a 10-times speed-up. The results for Ideal are, of course, the same: In[9] = IdealLoopNew == IdealLoop Out[9]= True Finally, let me show you a more idiomatic way to approach your problem, where we will vectorize it and get rid of loop at all: In[10] = (Ideal = Pick[Eb1, (Re[(E0 - Eb1)^2*Sqrt[1 - m^2/(E0 - Eb1)^2]] - k) /. _?Positive :> True]; Resolution = Map[RandomReal[NormalDistribution[#, Sigma], 1] &, Ideal];) // Timing {0.21, Null} In[11] = IdealLoopNew == Ideal Out[11] = True We got another 4-fold speedup, AND we don't need explicit loops, with all the garbage associated with them. Basically, this code first selects all points in Eb1 that are below the curve, and then computes the Resolution vector on these points only. Perhaps, wise use of Compile could further speed it up somewhat (2-4 times, my guess). Anyway, this code gets you 10000 points in 0.2 seconds (on my 5 years old PIV 1.8 Ghz single core), should be probably good enough for you. If you need around 10 000 000 or more, watch out for the memory usage - you may be better off doing some binning as the program runs, by splitting this large number into sycles of smaller lengths. As to the programming, try to avoid loops in Mathematica, and also global variables - even those you use in initialization can be made local to only the function(s) that need them. Generally, for anything longer than a couple of lines, try to write functions whose bodies only contain system symbols and formal parameters. Regards, Leonid On Mon, Apr 20, 2009 at 4:13 PM, Adam Dally <adally at wisc.edu> wrote: > I am using an Intel MacBook with OS X 10.5.6. > > I am trying to create 2 lists: "Ideal" and "Resolution". This is basically > a "Monte Carlo Integration" technique. Ideal should simulate the curve. > Resolution should simulate the curve convoluted with a normal distribution. > I want to do this for n=10 000 000 or more, but it takes far too long right > now. I can do n=100 000 in about 1 minute, but 1 000 000 takes more than an > hour. I haven't waited long enough for 10 000 000 to finish (it has been 5 > days). > > Thank you, > Adam Dally > > Here is the code: > > ClearAll[E0, Eb1, m, DeltaE, Sigma, k, n, y3, Ideal, Resolution, i, > normalizer, maxE, minE] > Eb1 = 0; k = 0; n = 10000; E0 = 2470; m = 0.2; DeltaE = 50; Sigma = 5; maxE > = E0 - m; minE = E0 - DeltaE; Resolution = {Eb1}; Ideal = {Eb1}; (*Setup > all constants, lists and ranges*) > > Eb1 = RandomReal[{minE, maxE}, n]; (*create a list of 'n' random Eb1 > values*) > k = -RandomReal[TriangularDistribution[{-2470, 0}, -0.1], n]; (*create a > list of 'n' random k values; triangle distribution gives more successful > results*) > > For[i = 0, i < n, i++, > > If[k[[i]] < Re[(E0 - Eb1[[i]])^2*Sqrt[1 - m^2/(E0 - Eb1[[i]])^2]], (*check > if the {k,Eb1} value is under the curve*) > AppendTo[Ideal, Eb1[[i]]]; (*Keep events under curve in 'Ideal'*) > y3 = Eb1[[i]]; (*cast element to a number*) > Eb1[[i]] = RandomReal[NormalDistribution[y3, Sigma], 1]; (*choose a > random number from a normal distribution about that point*) > AppendTo[Resolution, Eb1[[i]]]; ]] (*Keep that event in 'Resolution'*) > > >

**References**:**Help with Speeding up a For loop***From:*Adam Dally <adally@wisc.edu>