Re: Speed Up of Calculations on Large Lists

*To*: mathgroup at smc.vnet.net*Subject*: [mg108881] Re: Speed Up of Calculations on Large Lists*From*: Ray Koopman <koopman at sfu.ca>*Date*: Mon, 5 Apr 2010 08:02:36 -0400 (EDT)

Zach, The point I was trying to make was that inefficiencies in the code that was wrapped around MovingAverage were costing substantially more than compiling was saving. Ray ----- Zach Bjornson <bjornson at mit.edu> wrote: > Ray, > > Critical statement there is "under your test conditions." I played with > Stefan's problem for quite a while and came up with a few moving average > functions, and tried them all with and without compiling. His function > in particular was only 15% slow compiled/uncompiled on my computer with > his data set. The functions I came up with were usually faster when > compiled, depending on the data set. Also depending on the data set, > some were faster than the built-in MovingAverage function. They were > never faster than the inbuilt function with his data set however, so I > never sent my functions along. Since this came up though, my futzing is > below. > > My initial response to Stefen's inquiry was the thought that Compile > would have no effect on MovingAverage, or would just add kernel time > while Mmeca decides to execute it with normal Mathematica code, but I'm > not sure that's true. > > -Zach > > (*data-set dependencies are illustrated between the top and bottom half > of this*) > > $HistoryLength=0 (*to prevent artificially high speeds*) > > 1.1 Your function > movAverageOwn2FCorig = > Compile[{{dataInput, _Real, > 1}, {days, _Integer}, {length, _Integer}}, > N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ > Range[0, length - days, 1]] > > In[165]:= > First@Timing[ > Do[movAverageOwn2FCorig[Range[1000000], 2, 1000000];, {10}]]/10 > Out[165]= 1.7347 > > 1.2 Inbuilt Mathematica function > In[164]:= First@Timing[Do[MovingAverage[Range[1000000], 2];, {10}]]/10 > Out[164]= 1.6942 > > 1.3 My variation #1 > movAverageOwn2FCa = > Compile[{{dataInput, _Real, 1}, {days, _Integer}}, > Table[Mean[dataInput[[i ;; i + days - 1]]], {i, > Length@dataInput - days + 1}]] > > In[166]:= > First@Timing[Do[movAverageOwn2FC[Range[1000000], 2];, {10}]]/10 > Out[166]= 1.6146 > > Non-compiled function version gives a time of 4.0311 > for this same data set. > > 1.4 My variation #2 > movAverageOwn2Fb = > Compile[{{dataInput, _Real, 1}, {days, _Integer}}, > With[{innerdata = Partition[dataInput, days, 1]}, > Table[Mean[innerdata[[i]]], {i, Length@innerdata}] > ]] > > In[167]:= > First@Timing[Do[movAverageOwn2F3[Range[1000000], 2];, {10}]]/10 > Out[167]= 1.6287 > > Note that this *is* data-set dependent... for example, the same > functions tested on your data symbol give: > In[169]:= First@Timing[Do[MovingAverage[data, 2];, {10}]]/10 > > Out[169]= 0.0015 > > In[170]:= First@Timing[Do[movAverageOwn2Fa[data, 2];, {10}]]/10 > > Out[170]= 0.0171 > > In[171]:= First@Timing[Do[movAverageOwn2Fb[data, 2];, {10}]]/10 > > Out[171]= 0.0156 > > In[173]:= > First@Timing[Do[movAverageOwn2FCorig[data, 2, Length@data];, {10}]]/10 > > Out[173]= 0.0171 > > On 4/4/2010 7:45 AM, Ray Koopman wrote: >> Your compiled movAverageC takes 25% more time than the uncompiled >> >> movAv[data_, start_, end_, incr_] := Transpose@PadRight@Join[{data}, >> Table[MovingAverage[data, r], {r, start, end, incr}]] >> >> under your test conditions. >> >> On Apr 1, 3:59 am, sheaven<shea... at gmx.de> wrote: >> >>> Hello everyone! >>> >>> I am new to Mathematica and try get a understanding of its power. I >>> plan to use Mathematica mainly for financial data analysis (large >>> lists...). >>> >>> Currently, I am trying to optimize calculation time for calculations >>> based on some sample data. I started with with a moving average of >>> share prices, because Mathematica already has a built in moving >>> average function for benchmarking. >>> >>> I know that the built-in functions are always more efficient than any >>> user built function. Unfortunately, I have to create functions not >>> built in (e.g. something like "moving variance") in the future. >>> >>> I have tried numerous ways to calc the moving average as efficiently >>> as possible. So far, I found that a function based on Span (or >>> List[[x;;y]]) is most efficient. Below are my test results. >>> Unfortunately, my UDF is still more than 5x slower than the built in >>> function. >>> >>> Do you have any ideas to further speed up the function. I am already >>> using Compile and Parallelize. >>> >>> This is what I got so far: >>> >>> 1. Functions for moving average: >>> >>> 1.1. Moving average based on built in function: >>> >>> (*Function calcs moving average based on built in function for >>> specified number of days, e.g. 30 days to 250 days in steps of 10*) >>> movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end, >>> _Integer}, {incr, _Integer}}, Module[{data, size, i}, >>> size = Length[inputData]; >>> Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #], >>> size]& /@ Table[x, {x, start, end, incr}]]] >>> ] >>> ] >>> >>> 1.2. User defined function based on Span: >>> (*UDF for moving average based on Span*) >>> movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer}, >>> {length, _Integer}}, >>> N[Mean[dataInput[[1 + # ;; days + #]]]]& /@ Range[0, length - days, >>> 1] >>> ] >>> >>> (*Function calcs moving average based on UDF "movAverageOwn2FC" for >>> specified number of days, e.g. 30 days to 250 days in steps of 10*) >>> movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer}, >>> {end, _Integer}, {incr, _Integer}}, Module[{length}, >>> length = Length[dataInput]; >>> Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #, >>> length], length]& /@ Range[start, end, incr]]] >>> ] >>> ] >>> >>> 2. Create sample data: >>> data = 100 + #& /@ Accumulate[RandomReal[{-1, 1}, {10000}]]; >>> >>> 3. Test if functions yield same results: >>> Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days >>> to 250 days in steps of 10*) >>> >>> Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30 >>> days to 250 days in steps of 10*) >>> >>> Test1 == Test2 >>> Out = True >>> >>> 4. Performance testing (Singe Core): >>> AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];] >>> (*Repeat function 20x for testing purposes*) >>> Out = {1.3030000, Null} >>> >>> AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20, >>> 1}];] (*Repeat function 20x for testing purposes*) >>> Out = {11.4260000, Null} >>> >>> => Result UDF 9x slower >>> >>> 5. Performance testing (multi core): >>> LaunchKernels[] >>> >>> Out = {KernelObject[1, "local"], KernelObject[2, "local"]} >>> >>> DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC, >>> movAverageC] >>> >>> AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n, >>> 1, 20, 1}]];] >>> Out = {1.3200000, Null} >>> >>> AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10], >>> {n, 1, 20, 1}]];] >>> Out = {6.7170000, Null} >>> >>> => Result UDF 5x slower >>> Very strange that the built in function does not get faster with >>> Parallelize >>> >>> I would very much appreciate any input on how to decrease calculation >>> time based on the user defined function. >>> >>> Many thanks >>> Stefan >>> >> >