Re: Speed Up of Calculations on Large Lists

*To*: mathgroup at smc.vnet.net*Subject*: [mg108828] Re: Speed Up of Calculations on Large Lists*From*: Bill Rowe <readnews at sbcglobal.net>*Date*: Fri, 2 Apr 2010 05:20:31 -0500 (EST)

On 4/1/10 at 5:59 AM, sheaven at gmx.de (sheaven) wrote: >I am new to Mathematica and try get a understanding of its power. I >plan to use Mathematica mainly for financial data analysis (large >lists...). >Currently, I am trying to optimize calculation time for calculations >based on some sample data. I started with with a moving average of >share prices, because Mathematica already has a built in moving >average function for benchmarking. >I know that the built-in functions are always more efficient than >any user built function. Unfortunately, I have to create functions >not built in (e.g. something like "moving variance") in the future. >I have tried numerous ways to calc the moving average as efficiently >as possible. So far, I found that a function based on Span (or >List[[x;;y]]) is most efficient. Below are my test results. >Unfortunately, my UDF is still more than 5x slower than the built in >function. >Do you have any ideas to further speed up the function. I am already >using Compile and Parallelize. >This is what I got so far: >1. Functions for moving average: <function code snipped> >2. Create sample data: data = 100 + # & /@ >Accumulate[RandomReal[{-1, 1}, {10000}]]; a side point here. The plus function works on lists. That is: data = 100 + Accumulate[RandomReal[{-1,1}, 10000]]; will produce the same result as your code but be a bit faster. Note, the difference in speed here will be quite small and is clearly not the thrust of your message. But I point this out since such small difference can add up to something significant in more complex code. >3. Test if functions yield same results: Test1 = movAverageC[data, >30, 250, 10]; (*Moving average for 30 days to 250 days in steps of >10*) OK. Here is the timing results I get for you compiled code based on Span In[1]:= movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer}, {length, _Integer}}, N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days, 1]]; In[2]:= data = 100 + Accumulate[RandomReal[{-1, 1}, {10000}]]; In[3]:= Timing[Table[movAverageOwn2FC[data, 20, Length@data], {100}];] Out[3]= {1.45855,Null} Now here is a definition using ListConvolve In[4]:= newMoveAverage[data_, windowLen_] := Module[{ker = Table[1, {windowLen}]/windowLen}, ListConvolve[ker, data]] In[5]:= Timing[Table[newMoveAverage[data, 20], {100}];] Out[5]= {0.103379,Null} So, on my machine using a single core without Compile, using ListConvolve improves the speed by more than 10X. Using both parallel processing with both cores should improve this result for very large data arrays. Note, ListConvolve is so fast, the overhead of setting up parallel processes will probably degrade times for small data arrays. I have not tested this to verify my guess here. Compile also might improve things somewhat. But this probably won't be significant. Compile can offer significant improvement in some code particularly when procedural programming is used. But compile seldom offers improvement in code with one or two function calls and no procedural structures such as For. In fact, there are times when using Compile will actually degrade the execution speed. Finally, to demonstrate the code with ListConvolve does the same as your code: In[6]:= movAverageOwn2FC[data, 20, Length@data] == newMoveAverage[data, 20] Out[6]= True