Re: Speed Up of Calculations on Large Lists
- To: mathgroup at smc.vnet.net
- Subject: [mg108857] Re: Speed Up of Calculations on Large Lists
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sun, 4 Apr 2010 07:45:33 -0400 (EDT)
- References: <hp1ua5$65k$1@smc.vnet.net>
Your compiled movAverageC takes 25% more time than the uncompiled movAv[data_, start_, end_, incr_] := Transpose@PadRight@Join[{data}, Table[MovingAverage[data, r], {r, start, end, incr}]] under your test conditions. On Apr 1, 3:59 am, sheaven <shea... at gmx.de> wrote: > Hello everyone! > > I am new to Mathematica and try get a understanding of its power. I > plan to use Mathematica mainly for financial data analysis (large > lists...). > > Currently, I am trying to optimize calculation time for calculations > based on some sample data. I started with with a moving average of > share prices, because Mathematica already has a built in moving > average function for benchmarking. > > I know that the built-in functions are always more efficient than any > user built function. Unfortunately, I have to create functions not > built in (e.g. something like "moving variance") in the future. > > I have tried numerous ways to calc the moving average as efficiently > as possible. So far, I found that a function based on Span (or > List[[x;;y]]) is most efficient. Below are my test results. > Unfortunately, my UDF is still more than 5x slower than the built in > function. > > Do you have any ideas to further speed up the function. I am already > using Compile and Parallelize. > > This is what I got so far: > > 1. Functions for moving average: > > 1.1. Moving average based on built in function: > > (*Function calcs moving average based on built in function for > specified number of days, e.g. 30 days to 250 days in steps of 10*) > movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end, > _Integer}, {incr, _Integer}}, Module[{data, size, i}, > size = Length[inputData]; > Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #], > size] & /@ Table[x, {x, start, end, incr}]]] > ] > ] > > 1.2. User defined function based on Span: > (*UDF for moving average based on Span*) > movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer}, > {length, _Integer}}, > N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days, > 1] > ] > > (*Function calcs moving average based on UDF "movAverageOwn2FC" for > specified number of days, e.g. 30 days to 250 days in steps of 10*) > movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer}, > {end, _Integer}, {incr, _Integer}}, Module[{length}, > length = Length[dataInput]; > Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #, > length], length] & /@ Range[start, end, incr]]] > ] > ] > > 2. Create sample data: > data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]]; > > 3. Test if functions yield same results: > Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days > to 250 days in steps of 10*) > > Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30 > days to 250 days in steps of 10*) > > Test1 == Test2 > Out = True > > 4. Performance testing (Singe Core): > AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];] > (*Repeat function 20x for testing purposes*) > Out = {1.3030000, Null} > > AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20, > 1}];] (*Repeat function 20x for testing purposes*) > Out = {11.4260000, Null} > > => Result UDF 9x slower > > 5. Performance testing (multi core): > LaunchKernels[] > > Out = {KernelObject[1, "local"], KernelObject[2, "local"]} > > DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC, > movAverageC] > > AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n, > 1, 20, 1}]];] > Out = {1.3200000, Null} > > AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10], > {n, 1, 20, 1}]];] > Out = {6.7170000, Null} > > => Result UDF 5x slower > Very strange that the built in function does not get faster with > Parallelize > > I would very much appreciate any input on how to decrease calculation > time based on the user defined function. > > Many thanks > Stefan